Simple example of using MongoMapper with Ruby

The MongoMapper web site is really lacking on even simple examples, especially those that don’t use Rails. So, here’s a simple example that might help someone.

From the Gemfile:

source 'https://rubygems.org'

gem 'mongo_mapper'

And then the application:

require 'mongo_mapper'


# jam it into the database "mm"
MongoMapper.database = "mm"

class App

def create_user
user = User.new(:name => 'Johnny')
puts "user created"
user.save!
end

def find_user
query = User.where(:name => 'Johnny')
user = query.first # just the first
if not user.nil?
puts user.id
end
end

def delete_user
query = User.where(:name => 'Johnny')
user = query.first # just the first

user.destroy
end
end


class User
include MongoMapper::Document
key :name, String
timestamps!
end

app = App.new()
app.create_user
app.find_user
app.delete_user

The code does a few things:

  1. Creates a new user with a single field called name.
  2. Finds the user using the where function
  3. Removes (destroys/deletes) the user

The key things to note are that the where function returns a query and not the actual results. The results are fetched on demand. This is very similar to the extension methods and LINQ in .NET as those functions build a query that executed only when the results are first requested.

The same thing is true of MongoMapper in this case. The results are not returned until the first function is called. Alternatively, all or last could have been used. all of course returning a list of results that could be iterated in a loop.

If there were no results, the result of calling first in this example would be that the user variable would be nil.

The delete_user function above has absolutely no error checking.

Mongoose plugin runs for every new Schema

If you want to consistently apply changes to every Schema in Mongoose, it’s simple. Below is an example.

var mongoose = require('mongoose');

var checkForHexRegExp = new RegExp("^[0-9a-fA-F]{24}");
mongoose.plugin(function(schema, opts) { 
    schema.statics.isObjectId = function(id) { 
        if(id) { 
            return checkForHexRegExp.test(id); 
        } 
        return false; 
    }; 
}); 
var AnimalSchema = mongoose.Schema({ name: String }); 
var Animal = mongoose.model("Animal", AnimalSchema); if(Animal.isObjectId("521b4891039857e07aae695a")) { 
    var animal = new Animal(); // something more interesting here ... 
}

The code above uses the plugin function on the global mongoose object to add a new plugin. When you add a plugin using that technique, it runs for every schema that is constructed.

So, in this case, I’ve added a static function to every Schema called isObjectId which uses a simple regular expression (liberally borrowed straight from the bson ObjectId source code) to test whether a string looks like a valid ObjectId (in fact, a 24 character HEX string).

Now, as new models are created (mongoose.model), the Schema is first passed to any defined plugins. The plugin adds the new function isObjectId. As you can see, the new Model has a static method called isObjectId.

The plugin will not affect any Schemas/Models that were defined before the plugin was added.

Using this technique, you could add standardized fields, indexes, etc. without repeating the same code to all Schemas. Of course, you can also use the plugin method defined on a Schema to selectively add functionality.

var isObjectIdPlugin = function(schema, opts) {
    schema.statics.isObjectId = function(id) {
        if(id) {
            return checkForHexRegExp.test(id);
        }
        return false;
    };
};


var AnimalSchema = mongoose.Schema({
    name: String
});

AnimalSchema.plugin(isObjectIdPlugin);

var Animal = mongoose.model("Animal", AnimalSchema);

if(Animal.isObjectId("521b4891039857e07aae695a")) {
    var animal = new Animal();
    // something more interesting here ...
}

Above, the code applies the plugin to only the AnimalSchema using the plugin method.

Of course, adding the same method statically may not be very useful – instead it probably belongs somewhere in a utility class (and really it’s too bad it’s not just exposed directly by the BSON ObjectId class).

Using $inc to increment a field in a sub-document in an array and a field in main document

(Blog post inspired by question I answered on StackOverflow)

Lets say you have a schema in MongoDB that looks something like this:

{
  '_id' : 'star_wars',
  'count' : 1234,
  'spellings' : [ 
    { spelling: 'Star wars', total: 10}, 
    { spelling: 'Star Wars', total : 15}, 
    { spelling: 'sTaR WaRs', total : 5} ]
}

Atomically, you’d like to update two fields at one time, the total for a particular spelling which is in a sub document in an array, and the overall count field.

The way to handle this is to take advantage of the positional array operator and the $inc operator.

Starting with just updating the count, that’s easy:

db.movies.update( 
    {_id: "star_wars"}, 
    { $inc : { 'count' : 1 }}
)

The query matches on the document with the _id of star_wars, and increments the count by 1.

The positional operator is where the mongo-magic comes into play here. If you wanted to just update a single sub-document in the array, add it to the query. First, we’ll try finding the right document:

db.movies.find( {
    _id: "star_wars", 
    'spellings.spelling' : "Star Wars" }
)

That matches the right document, but also returns all of the elements of the array.

But, wait, there’s more! When you match on an element/document of an array, the position of the match is remembered and can be used in the update phase. You do that using the positional operator. Using the document above, you’ll use this: spellings.$.total. If you knew the specific index into the array, the $ could have been replaced with the zero-based index number (like a 1 for example in this case: spellings.1.total).

Putting it all together then results in a slick and simple way of incrementing multiple fields in a document.

db.movies.update( 
    {_id: "star_wars",
     'spellings.spelling' : "Star Wars" },
    { $inc : 
        { 'spellings.$.total' : 1, 
        'count' : 1 }})

Results:

{
  '_id' : 'star_wars',
  'count' : 1235,
  'spellings' : [ 
    { spelling: 'Star wars', total: 10}, 
    { spelling: 'Star Wars', total : 16}, 
    { spelling: 'sTaR WaRs', total : 5} ]
}

Finding duplicates in MongoDB via the shell

I thought this was an interesting question to answer on StackOverflow (summarized here):

I’m trying to create an index, but an error is returned that duplicates exist for the field I want to index. What should I do?

I answered with one possibility.

The summary is that you can use the power of MongoDB’s aggregation framework to search and return the duplicates. It’s really quite slick.

For example, in the question, Wall documents had a field called event_time. Here’s one approach:

db.Wall.aggregate([
       {$group : { _id: "$event_time" ,  count : { $sum: 1}}},
       {$match : { count : { $gt : 1 } }} ])

The trick is to use the $group pipeline operator to select and count each unique event_time. Then, match on only those groups that contained more than one match.

While it’s not necessarily as readable as the equivalent SQL statement potentially, it’s still easy to read. The only really odd thing is the mapping of the event_time into the _id. As all documents pass through the pipeline, the event_time is used as the new aggregate document key. The $ sign is used as the field reference to a property of the document in the pipeline (a Wall document). Remember that the _id field of a MongoDB document must be unique (and this is how the $group pipeline operator does its magic).

So, if the following event_times were in the documents:

event_time
4:00am
5:00am
4:00am
6:00pm
7:00a

It would results in a aggregate set of documents:

_id count
4:00am 2
5:00am 1
6:00pm 1
7:00am 1

Notice how the _id is the event_time. The aggregate results would look like this:

{
        "result" : [
                {
                        "_id" : "4:00am",
                        "count" : 2
                }
        ],
        "ok" : 1
}

How to rewrite a MongoDB C# LINQ with a Projection Requirement using a MongoCursor

The LINQ Provider for MongoDB does not currently take into account data projections efficiently when returning data. This could mean that you’re unnecessarily returning more data from the database than is needed.

So, I’m going to show you the pattern I applied as a replacement for the LINQ queries when I need to use a projection.

Given the following simple LINQ statement:

var query = 
    (from r in DataLayer.Database
         .GetCollection<Research>()
         .AsQueryable<Research>()
        where !r.Deleted
        select new
        {
            Id = r.Id,
            Title = r.Title,
            Created = r.Created
        }).Skip(PageSize * page).Take(PageSize);

it can be converted to a MongoCursor style search like this:

var cursor = DataLayer.Database.GetCollection<Research>()
    .FindAs<Research>(Query<Research>.NE(r => r.Deleted, true))
        .SetFields(
            Fields<Research>.Include(
                r => r.Id, 
                r => r.Title, 
                r => r.Created))
        .SetLimit(PageSize)
        .SetSkip(PageSize * page);           

I’ve attempted to format it in a similar way mostly for my own sanity. As you see, both queries first get access to the database collection. But, instead of using AsQueryable<T>, you’ll use the FindAs<T> method. The query is more verbose in the second example, although not overly so. I chose to keep it strongly typed by using the generic version Query<Research>. By doing so, it meant that I could use an Expression to set the field/property that was being queried, rather than rely on a string (I could have just passed “Deleted” as a string in the code).

By strongly typing the parameters in this way, it meant that the compile process can catch things like type mismatches (verifying that the value being compared is a Boolean for example as is the Deleted property).

Secondly, and this addresses the requirement of a result projection, I’ve included just those fields that are required by the UI rather than all of the fields in the document. One of the fields is potentially quite long (up to 1MB of text), and in this case, unnecessary in a summary list display in my web application. Here, I used the SetFields method of the MongoCursor.

The C# driver includes a static class called Fields (in a generic and non-generic form) which can be used to express the set of fields to be included/excluded. I’ll point out that there is an option to just pass in a list of strings to SetFields, but it’s not strongly typed. So again, no compile-time checking that I’ve got the property names correct. I’m going for safety here, so I chose the strongly-typed generic implementation of Fields<Research>. Then, using the Expression syntax again, I’ve selected the fields I needed as a parameter list.

Finally, I added some code to limit the result size and set the skip equivalent to the original LINQ query.

There are a number of other Query methods that you can use to build more complex operations.

For example:

var q = Query.And(
    Query<Research>.Exists(r => r.Title), 
    Query<Research>.Matches(
        r => r.Title, BsonRegularExpression.Create(new Regex("^R"))));

The above maps to the following MongoDB query:

{ "$and" : [{ "Title" : { "$exists" : true } }, { "Title" : /^R/ }] }

Title field exists and the Title field starts with an uppercase “R”.

While the Query style syntax is more verbose than the equivalent LINQ statement, the result still is expressive and very readable and maintainable.

FYI: If there’s an index on Title, then the /^R/ syntax returns the results the most efficiently in MongoDB (as it stops searching after the first character).