UPDATE: The project is now Scale0 https://github.com/joerussbowman/Scale0
Looks like I am going to take a break from working on the unscatter.com interface for a bit. There’s a couple bugs I’m going to fix, but adding new apis and other features is going to wait a bit.
My new project is going to be something that will eventually support the site though. I’ve been reading about zeromq and this is something I just have to play with.
All along the core design for unscatter.com has been to create something that scales horizontally. The idea being to take it to something that could scale beyond a single datacenter (or cloud host) with relative ease, and could run on lots of small machines.
The current framework is an nginx frontend, with multiple Python Tornado backends using MongoDB as a shared resource. Unscatter.com stored data is generally cache, it interfaces with search, social and eventually media apis to get data. Even the real time streaming information I intend to implement will be a temporary storage, unless I do decide to follow up on making a Twitter/Facebook/more client. I’m still on the fence on that one.
What I intend to attack is the nginx frontend. I want something more than a caching proxy, I want something that can help me truly scale.
This is where treating unscatter.com as a hobby, rather than a business is having a huge personal payoff. If I was treating it as a business I’d focus on the product which is the wbesite and build this scaling layer later. Nginx easily supports any load I can expect to see for quite some time. Heck, I don’t even have the MongoDB caching in, because I don’t get the traffic to warrant it. I’m not doing any marketing right now, according to Clicky I’ve had 9 visitors today.
However, it’s a hobby, and I’m really excited to get down and dirty with zeromq. This layer, I’m going to open source. Not usable yet, and in fact the code hosted now probably won’t be used at all. The project I’m calling 0mqproxy, and it will be the key to creating a scaling architecture for websites. The Github link is here - https://github.com/joerussbowman/0mqproxy
That’s right, I will be making this a 100% opensource project hosted under an Apache 2.0 license. If you’re interested, just watch the project on Github. I’ll be using that as the master server for my repo and will update as I work on it. I’ve still got a lot of reading to do on the zeromq guide, and you can likely expect a lot of refactoring in the first commits as I figure things out. The goals I’m looking at for the project are:
- Smart proxying. For example, if multiple requests come in for a url, I want only one request to go to a backend for generating that result, the rest of the requests should receive that result as well.
- Pluggable caching. I want the front end proxy to be a caching server. The initial implementation of this will be file based, however I want to make a plugin to replace the filesystem layer with MongoDB. Using asycmongo I can make a system that is 100% asynchronous for cache read/write and also take advantage of MongoDB capped collections to maintain space.
- Multiple datacenter/cloud support. Ok, say I want to host unscatter.com on both Rackspace and Amazon. Under the current setup I could do round robin DNS and have a cluster at each location. Each one would run independently. My idea is to be able to tie the two together. This way, if the Rackspace location backends got overloaded while the Amazon group was running light, requests could be routed to Amazon for processing as well. This also could be a way to keep both systems in sync data wise, though that idea is really rough and may be a different product.
- Proxy server capable of acting as an application server. With the power of Tornado at the front I can do things like serve static files with aggressive caching which is a feature of the framework, or use the framework itself to parse and mage requests without needing to pass them on to the backend processing servers.
This is the kind of large scale stuff I’ve been wanting to build for a while, zeromq just made it a lot more possible for me to put something together. This is going to be fun. I still think I need a better name, so expect that to change at some point.
UPDATE: The project is now Scale0 https://github.com/joerussbowman/Scale0