Sysadmin by day, developer by night

So, if you’re reading this, then you’ve probably already read http://www.mikealrogers.com/2010/07/mongodb-performance-durability/

It was an interesting article, and it did make me think a little, but in then end I’m going to stick with MongoDB. While I do agree that a transaction log approach would be useful, I’m comfortable using it in a production environment until they get around to doing it.

A transaction log is still prone to failure for same reasons any disk write is. A transaction log can be corrupted on write like any other file can. A transaction log is a convenient way to recover a database in the event of a failure up to the point of the last log file. That’s it.

MongoDB took the replication approach. A write goes to the master server, and then to x slaves. If the master gets corrupted then you fail over to a slave. You aren’t down during the time it takes to do a restore, and really, this is a pretty standard practice for most database solutions I’ve seen where high availability is a requirement. They also state if you’re looking for single server durability, MongoDB isn’t the right approach. If you’re really looking at avoiding the loss of one minute of data, the replication approach is the safer way to go. An array controller goes and loses everything in it’s cache, you could potentially have lost a few transactions right there because they never got flushed to disk.

Now, the one thing I’d touch on that not many people have commented on is the assertions about writes not returning anything by default. First, the key word is “by default”. You can configure your write so that not only does it return after write, but also make it wait until x slaves have replicated the write as well.

So, now lets look at a use for having immediate return on write attempts. I believe on Hacker News one of the 10gen guys brought up a case of monitoring data. That’s a pretty specific example and not something a lot people are going to do. Now, MongoDB is something that’s looked at a lot by web application developers. Here’s where it’s a perfect fit for them. Caching in dynamic websites.

In a dynamic website you you can’t rely on reverse proxy caching for everything. Sometimes you have to cache specific page elements for a specific session. The most common thing to look at for these cases is memcached, due to it’s speed based on memory based writes. However, MongoDB could potentially be a little quicker.

Lets look at how caching works.

A web request is made which requires processing or network fetching to produce a value.
The first thing that is checked is the Cache collection to determine it’s already been calculated.
If not, the process of calculating the value is done, and then it’s sent to mongodb to store. No wait on write.

Rinse, repeat, with the cache value being returned more often than not. If the attempt to write to the cache failed, then it would be written on the next request anyway, you however have shaved some i/o blocking on the non-cached responses. It also means you can use MongoDB to handle all of this, meaning you don’t need to go install memcached. That’s where MongoDB’s focus on performance comes into play.

MongoDB also allows you to specify the size of a collection, meaning your Cache collection can only ever take up x amount of space, dropping older data in order to input new data. Personally I would prefer an approach similar to memcached where you can set how long a piece of data is valid for, but really you can handle this at your application layer by setting a timestamp in the document that includes your cache value.

Overall, I really enjoy working with MongoDB. I finally settled on it because it’s just plain easy to use, and is really flexible and able to meet all my data requirements. Nothing is 100% durable, and that’s why us systems administrators spend so much time working on backup solutions, offsite, onsite, hot, cold and even more time validating the restore methods work on a monthly basis. Your Oracle, MySQL, CouchDB, MongoDB and raw key file hash system can all die in amazingly creative ways, and no transaction log is ever going to 100% guarantee every attempted write transaction. That’s just the way it is.

  1. joerussbowman posted this
blog comments powered by Disqus
Technorati Profile