A collection of computer systems and programming tips that you may find useful.
Brought to you by Craic Computing LLC, a bioinformatics consulting company.

Tuesday, February 15, 2011

Always index columns that you want to sort on in Mongo

I'm using Mongo as a non-relational database for a few projects. In general it's working out great. MySQL would work too but I like not having to explicitly create a database or run migrations. Plus I figure you can't really understand the strengths and weaknesses of a technology unless you build a real application with it.

I work in Ruby and use the MongoMapper and Mongoid Object Data Mappers to talk to Mongo.

One issue that I do not like is the requirement that you explicitly create an index for every column that you think you will want to sort on. If you don't then all the data gets loaded into memory for the sort and you get an error like this:

[...]/gems/mongo-1.2.1/lib/mongo/cursor.rb:86:in `next_document': 
too much data for sort() with no index (Mongo::OperationFailure)

And if you want to sort on two columns then you need an index on the combination of the two.

You can add indexes at any point - it takes some action but it's not that big a deal. But it doesn't 'just work'... in MySQL it does - an index might give you better performance but it doesn't blow up without one.

You'll hear people claim that the NoSQL databases are schema-free, giving you a lot of flexibility. I don't really buy that argument - in most applications you want a clear schema.

Where I do see the benefit is that, with NoSQL databases, your schema resides your Model - not in the DB itself - and that is where it belongs. When you want to change the schema you just change the Model - no database migrations - very flexible.

But, with Mongo at least, if you have to define indexes ahead of time in order to sort even relatively small numbers of objects then that nullifies some of that benefit.


No comments:

Archive of Tips