Notes from Node Streams


These are my notes from a talk titled Node Streams given by Marco Rogers on July 3, 2012 during NodeConf 2012. Marco works at Yammer and you can follow him on Google+ here:

I'm going to go fast in 20 minutes! We know how node works from a high level; you have evented IO, you set callbacks, you use event emitters to get data incrementally. What does that mean? Let's walk through it. When you meet a command that needs I/O you don't block your execution; you just set a callback and go do something else. But in most platforms, you don't read in all the data at once, it comes in incrementally, as fast as the operating system can reasonably make it available, so we need a different paradigm than these callbacks; this is where event emitters come in handy. This pattern coalesced around the idea that we have things that are going to happen more than once, and any time data is available, call this callback. That's great, and really important, because it gave us the construct that we needed to make node more efficient in how it uses data and how the application is able to use that data over and above the way we usually use it in an all-at-once perspective.

Streams are just special event emitters. Event emitters can emit request events for example, but streams only emit data. Streams are the way node tells you about incremental data as it comes into your system. You'll get a bunch of 'data' events and one 'end' event. We now have a construct to allow us to process our data without pulling it all in at once.

Node is meant to be fast; a better programming paradigm than we're used to, like PHP where it kills all your RAM and then you're on a shared host and you kill everybody else's website. We, on the contrary, have one of the fastest dynamic VMs on the planet. So why when we use node do we procrastinate? Streams encourage you to not procrastinate with your data. Buffering all your data into memory is ineffecient; it's not the best way to keep data flowing through our app. Streams are meant to be composed together into a pipe of continuous data.

Streams are the abstract idea that you're going to get incremental data. Node has coalesced this idea into an API called the Streams API, the constructs that node gives you to handle streams efficiently. Pipe() is the main method in the Streams API. Instead of dealing with specific data, so you can do req.pipe(file).

In the same way, if you need to gzip a file, you can do fs.createReadStream('./file.txt').pipe(gzipper).pipe(response);

You might need to do some asynchronous work before piping; you may want to decide whether to use a streaming pipe chain or not! Probably the most common question is, why doesn't pause() work? We are not going to go into that right now.

Did you enjoy this post? Please spread the word.