Reading large file using Node.js

Published on 04 April 2020

I got a task to analyse a massive dataset of log files. When you open the file in Excel, it would simply freeze your laptop. Given the limitation of tools available, I try to parse the file using node.js script.

Problem: To read a small file, you may use the script below:

var fs = require('fs');

fs.readFile('path/mySmallFile.txt', 'utf-8', (err, data) => {
  if (err) {
    throw err;
  }
  console.log(data);
})

Then you should be able to read this small file content. However, when the file size is large, you would encounter an error with buffer. Such as RangeError: Attempt to allocate Buffer larger than maximum size. The execution would stop with an error;

Error: "toString" failed
  at stringSlice (buffer.js)
  at Buffer.toString(buffer.js)
  at FSReqWrap.readFileAfterClose [as oncomplete]

Solution: In order to read the large file, you may import the native library for readline

var fs = require('fs');
var readline = require('readline');

const rl = readline.createInterface({
  input: fs.createReadStream('path/largeFile.csv'),
  output: process.stdout,
  terminal:false
})

rl.on('line', (line) => {
  console.log(line);
})

rl.on('pause', () => {
  console.log('Done!');
})

Replace the file path with your path to the large file to process. You can process the file line by line inside the on(‘line’) function, such as parsing to json and increment the counter. The final sum can be displayed at the on(‘pause’) function after finish reading the file.

Now you should be able to process massive dataset with nodejs. For more information, please read the official documentation: https://nodejs.org/api/readline.html