How NodeJS saved my web application
Recently I've been working on a new web application. One of the features allows users to upload audio files to the application. This is the story about how tackling an unforeseen bug led to a massive improvement in the application architecture, and an appreciation of a whole new approach to building web-apps.
First of all, let's get clear what's going on in this particular feature. The flow looks something like this:
It's pretty straightforward: the user selects a file, and while they're filling in some additional details and watching a progress meter, the file is being uploaded, processed on the application server and then saved to Amazon S3. Once all the boxes are ticked, everybody can move on.
Now a little bit of background. I've got previous in building websites similar to this one. Last year I launched DonkDJ.com, which revolved around almost the same principle. The user chooses a file to the DonkDJ app servers: then, while they watched a progress meter and (occasionally) clicked on some ads, the song was uploaded, rhythmically analyzed, mutilated usually beyond recognition and returned to the user.
At the time, I used what was probably the latest web technologies to achieve this. I used Starling as a message queue, and a batch of Workling worker processes which handled the processing jobs as they appeared on the queue. Basically, the server-side flow went something like this (I've missed out the UI part this time):
Now there's nothing fundamentally wrong with this approach, and it certainly worked. DonkDJ turned out to be a viral media smash hit, and within 24 hours of launch it was processing up to 5,000 uploads per day, without ever really falling over. Not bad for a 1Gb Slicehost server.
There are, however, three big disadvantages to this setup:
- It should be clear from the diagram above that architecture is serial. First of all, the complete file has to be uploaded to the server. Then (assuming a worker process is available immediately) the file is processed. Then after it's processed, it's transferred to S3. Finally, the UI can be informed that we're ready to move on. This isn't necessarily the most efficient way to work. What if some of the processing tasks could take place in parallel? That could mean a much shorter wait, a better user experience and quite possibly a resulting higher conversion or repeat-visit rate.
- The communication between different threads (one handling the UI, the other processing the file) was complicated. This wasn't helped by the fact that while making a request from the UI to the server was just a simple AJAX request, it was difficult-to-impossible to send a message in the other direction, from the server thread to the UI. This meant that constant polling, and a lot of SQL attribute-juggling, was necessary to make the different sides communicate properly. Far from elegant.
- The design of Workling meant that the entire Rails environment had to be loaded into memory each time the application needed a new worker. Busy periods required upwards of 8 worker processes to keep up, plus 2-4 extra Rails processes to serve the website itself. Running 10-12 Rails processes on a 1Gb server is no laughing matter, especially when most of them are also processing large binary files.
Now web technology changes a lot in a year, but when I started work on my new project (which requires, as you'll recall, a broadly similar architecture), I was naturally inclined to go with the experience I already had. After all, it had scaled successfully to a relatively high load, and I can see nothing in principle which would stop it scaling further. Just create some EC2 or Slicehost instances with more worker processes and employ nginx or haproxy as a load balancer. And so off I went, building my application, and having quite a lot of fun.
The problem surfaced deep into the development process. As you can see in the first diagram, this application (unlike DonkDJ) asks the user to fill in an additional details form while he or she are waiting for their upload to be processed.
Paperclip (the Rails plugin I used in both applications to handle uploaded files) prides itself on handling file attachments as normal ActiveRecord attributes. In other words, you can do something very roughly like this:
This means that the attachment is processed and saved, all during the ActiveRecord callback cycle. Because Rails opens an SQL transaction every time an instance is saved, this means that I was faced with a flow that looked something like this:
In other words, when the user tries to submit their "additional details" form, then unless file processing is complete then MySQL is unable to save their information -- the database row is locked! Worse, there's no easily catchable exception: MySQL just hangs for 30 seconds and then times out. All this time, the user has a spinny AJAX wheel to stare at. This was a big, big problem.
So I weighed up my options. As I saw it, they were:
- Use an MyISAM table instead of InnoDB. MyISAM tables don't support transactions, and so Rails (presumably) wouldn't lock the row and the problem would be solved. Not really ideal though: I didn't have any need for super-blazing SQL speed, and I didn't feel inclined to sacrifice the benefits of InnoDB for the sake of a Rails plugin. Update: See janl's comment below.
- Hack Rails so that it didn't open a transaction on this particular save. Really, really far from being either elegant or ideal.
- Alter the UI so that the user wasn't able to save their "additional details" form until their upload had been completed. The easiest option but again, it didn't seem right to sacrifice so much flexibility because of the possibly bad unsuitable design of a Rails plugin.
- Create a proxy class that was instantiated in place of ActiveRecord::Base if the database was locked. Upon saving, the data would be saved not to the database, but to Redis, which of course is not affected by SQL locks. Upon being processed, the worker process would fetch the additional information from Redis and save it to the database. Yes, I really did implement this, but please: nobody should have to go to this much trouble to implement concurrency in a web application.
At this point I remembered about nodejs. I wasn't totally sure what it was about, but I needed to solve this concurrency problem, and I was looking for a good solution to implement an upload progress bar (hopefully) without using Flash. Why not take a look, I thought.
It turns out that nodejs is phenomenally useful. It is, in short, a lightweight implementation of server-side Javascript which is perfectly suited to handling file uploads and background processing. Others have done fine jobs of describing its potential, but I will simply emphasise that its architecture is designed to be event-driven, rather than threaded. Just like programming javascript in the browser, that means that it is perfectly suited to designing a system which reacts to events as they happen, instead of waiting in a queue to fetch a job.
Over this weekend, I've totally re-written the backend of my project using nodejs. This is the new flow:
Here are the advantages:
- Whereas in DonkDJ I was using a Flash plugin to handle the file upload, using NodeJS means that I can get my hands on the data as it arrives, chunk-by-chunk. This means that simply by piping the data I receive into the file-system and back, I can process and transcode the audio as it arrives on the server. This means much quicker processing times and happier users.
- The only reason I can't also simultaneously upload to S3 is that Amazon requires an accurate "Content-Length" header before it starts uploading. Because transcoding changes the size of the file, I have no way of knowing this value until the processing stage is complete. However: what if for some reason the file doesn't need processing? Perhaps it's already compressed enough, or is too small to worry about bandwidth. In this case, I could use the "Content-Length" header of the original request to upload to S3 while simultaneously uploading to the server. (Alternatively, send a message back to the browser and ask it to upload to S3 directly using a Flash object, therefore saving the bandwidth cost, and time, of sending the file twice.)
- NodeJS is handsomely lightweight, handles lots of simultaneous jobs well, and doesn't load the Rails environment at all. Instead, I use Redis to communicate between Rails and the backend processes. This means that I will typically only need a couple of instances of Rails loaded, which is enough to serve a quite busy website if there's no background processing involved.
- I can remove Paperclip, Resque and all other background processing tasks from my Rails application entirely. Rails now handles serving dynamic web pages and nothing else. This is clearly the way it ought to be.
On top of the advantages of NodeJS, it is also perfectly suited to using Websockets, which I've employed to allow bi-directional communication between the worker processes and the browser. Because both server and client are using event-driven Javascript, this makes it wonderfully easy to implement features like upload progress bars and dynamic status messages, which in DonkDJ involved messy hacks, Rails controllers, Flash movies and who knows what else to achieve. (Check out web-socket-js for seamless crap-browser compatibility when using websockets).
So, in conclusion. There is no doubt that some of the advantages of NodeJS could have been achieved in DonkDJ with a better-designed architecture, but it would have been very painful to achieve. If you're looking to build a web application which handles any kind of file-uploading, you should look more into it. It's awesome.
Follow rfwatson on Twitter.



