How did we serve more than 20,000 IPython notebooks for Nature readers?



The IPython/Jupyter notebook is a wonderful environment for computations, prose, plots, and interactive widgets that you can share with collaborators. People use the notebook all over the place across many varied languages. It gets used by data scientists, researchers, analysts, developers, and people in between.
As I alluded to in a writeup on Instant Temporary Notebooks, we, a combination of IPython/Jupyter and Rackspace, were prepping for a big demo as part of a Nature article on IPython Notebooks by Helen Shen. The impetus behind the demo was to show off the IPython notebook to readers in an interactive format. What better way than to provide a live notebook server to readers on demand?
To do this, we created a temporary notebook service in collaboration with the IPython/Jupyter team.
How does this temporary notebook service work?
tmpnb is a service that spawns new notebook servers, backed by Docker, for each user. Everyone gets their own sandbox to play in, assigned a unique path.

When a user visits a tmpnb, they are actually hitting an HTTP proxy which routes initial traffic to tmpnb's orchestrator. From here a new user container is set up and a new route, for example /user/fX104pghHEha/tree, is assigned on the proxy.
Planning the Notebook Demo
When Brian Granger from Cal Poly and Richard Van Noorden from Nature asked for a demo, it was quite open what that could mean. Do we have people log in to a JupyterHub installation? Refer them to Wakari or SageMathCloud?
The goal Richard stated was to provide at most 150 concurrent users. In the back of our minds, we in the IPython/Jupyter project knew the initial spike in traffic would be far greater and we should be able to handle the load.
I was incredibly lucky to teach at and attend the incredible and crazy event that is the Mozilla Festival. Richard Van Noorden from Nature was in attendance, as were my colleagues from the IPython community including Matthias Bussonnier, Aron Ahmadia, and Jeramia Ory. While we were all in one place we swarmed Richard with ideas about what should be in the notebook Nature readers got to interact with. After the iterations there, pretty soon we had more collaboration from David Ketcheson, a researcher at KAUST, Stefan van der Walt, a scikit-image lead, and others.
You can tell this is a community with a lot of passion and aligned around a common format that has helped propel their research.
The other crazy benefit about being in London was that I got to go to the Nature offices to talk about the architecture backing the demo and make plans for operations. Chris Ryan, art editor at Nature, would put an iframe as part of the article, expanding it into a lightbox for users when they clicked. For us, this just meant providing one URL they could rely on for content, as well as adjusting CSP or X-Frame restrictions.
Kicking the Nature Notebooks into Operation
On the day of launch, we watched as the notebooks started getting gobbled up and recycled.
After some smooth sailing, we watched as it ticked toward our 512 user mark. After reading comments on various social media sites, we decided to kick it up a notch and allow for thousands of concurrent users while the demo was live.
This bit us in a couple ways. In order to scale across hosts we would need to put the proxy and tmpnb in front of multiple Docker hosts. This was before Docker Swarm. Trying to swap largely untested pieces out from underneath production while dealing with proxy issues did not sound ideal. Instead, Min RK quickly whipped up the tmpnb-redirector, which uses the /stats endpoint to redirect users to new servers. This made rotating old nodes out easy as well.

Closing Up
In the end we ended up serving more than 20,000 notebook servers and counting.
We love IPython notebooks, the overall architecture that has been built out here, and hope to keep supporting open-source projects doing interesting things on the internet in a way that benefits community, technology, and the whole ecosystem.

