Reading Large Files Using Node.js


I recently faced the task of analyzing a massive dataset consisting of log files. When I attempted to open the file in Excel, my laptop simply froze. Given the limitations of the tools available, I decided to parse the file using a Node.js script.

Problem

To read a small file, you might use the following script:

var fs = require("fs")

fs.readFile("path/mySmallFile.txt", "utf-8", (err, data) => {
  if (err) {
    throw err
  }
  console.log(data)
})

Using this script, you should be able to read the content of a small file. However, for large files, you might encounter a buffer error like RangeError: Attempt to allocate Buffer larger than maximum size. The script would terminate, producing an error similar to the following:

Error: "toString" failed
  at stringSlice (buffer.js)
  at Buffer.toString (buffer.js)
  at FSReqWrap.readFileAfterClose [as oncomplete]

Solution

To read a large file, you can use Node.js’s native readline library like so:

var fs = require("fs")
var readline = require("readline")

const rl = readline.createInterface({
  input: fs.createReadStream("path/largeFile.csv"),
  output: process.stdout,
  terminal: false,
})

rl.on("line", line => {
  console.log(line)
})

rl.on("pause", () => {
  console.log("Done!")
})

Replace the file path with the path to your large file. Inside the on('line') function, you can process the file line by line—such as parsing it into JSON and incrementing a counter. The final sum can be displayed using the on('pause') function after the file has been completely read.

With this approach, you should now be able to process massive datasets using Node.js. For more information, please refer to the official documentation: Node.js Readline API.