How to rewrite a MongoDB C# LINQ with a Projection Requirement using a MongoCursor

The LINQ Provider for MongoDB does not currently take into account data projections efficiently when returning data. This could mean that you’re unnecessarily returning more data from the database than is needed.

So, I’m going to show you the pattern I applied as a replacement for the LINQ queries when I need to use a projection.

Given the following simple LINQ statement:

var query = 
    (from r in DataLayer.Database
         .GetCollection<Research>()
         .AsQueryable<Research>()
        where !r.Deleted
        select new
        {
            Id = r.Id,
            Title = r.Title,
            Created = r.Created
        }).Skip(PageSize * page).Take(PageSize);

it can be converted to a MongoCursor style search like this:

var cursor = DataLayer.Database.GetCollection<Research>()
    .FindAs<Research>(Query<Research>.NE(r => r.Deleted, true))
        .SetFields(
            Fields<Research>.Include(
                r => r.Id, 
                r => r.Title, 
                r => r.Created))
        .SetLimit(PageSize)
        .SetSkip(PageSize * page);           

I’ve attempted to format it in a similar way mostly for my own sanity. As you see, both queries first get access to the database collection. But, instead of using AsQueryable<T>, you’ll use the FindAs<T> method. The query is more verbose in the second example, although not overly so. I chose to keep it strongly typed by using the generic version Query<Research>. By doing so, it meant that I could use an Expression to set the field/property that was being queried, rather than rely on a string (I could have just passed “Deleted” as a string in the code).

By strongly typing the parameters in this way, it meant that the compile process can catch things like type mismatches (verifying that the value being compared is a Boolean for example as is the Deleted property).

Secondly, and this addresses the requirement of a result projection, I’ve included just those fields that are required by the UI rather than all of the fields in the document. One of the fields is potentially quite long (up to 1MB of text), and in this case, unnecessary in a summary list display in my web application. Here, I used the SetFields method of the MongoCursor.

The C# driver includes a static class called Fields (in a generic and non-generic form) which can be used to express the set of fields to be included/excluded. I’ll point out that there is an option to just pass in a list of strings to SetFields, but it’s not strongly typed. So again, no compile-time checking that I’ve got the property names correct. I’m going for safety here, so I chose the strongly-typed generic implementation of Fields<Research>. Then, using the Expression syntax again, I’ve selected the fields I needed as a parameter list.

Finally, I added some code to limit the result size and set the skip equivalent to the original LINQ query.

There are a number of other Query methods that you can use to build more complex operations.

For example:

var q = Query.And(
    Query<Research>.Exists(r => r.Title), 
    Query<Research>.Matches(
        r => r.Title, BsonRegularExpression.Create(new Regex("^R"))));

The above maps to the following MongoDB query:

{ "$and" : [{ "Title" : { "$exists" : true } }, { "Title" : /^R/ }] }

Title field exists and the Title field starts with an uppercase “R”.

While the Query style syntax is more verbose than the equivalent LINQ statement, the result still is expressive and very readable and maintainable.

FYI: If there’s an index on Title, then the /^R/ syntax returns the results the most efficiently in MongoDB (as it stops searching after the first character).

How to view the MongoDB Query when using the C# LINQ Provider

If you’re using the Official MongoDB C# Driver from 10gen, you may want to occasionally verify that the generated query matches your LINQ query (or at least that it’s building something efficient).

Take for example this query:

var query = 
    (from r in DataLayer.Database.GetCollection<Research>().AsQueryable<Research>()
        where !r.Deleted
        select new
        {
            Id = r.Id,
            Title = r.Title,
            Created = r.Created
        }).Skip(PageSize * page).Take(PageSize);

query.DebugWriteMongoQueryText();

I wanted to verify that the query was checking the Deleted property, and see if the projection was considered as part of the query, so a few lines of code later …

public static string ToMongoQueryText<TQueryable>(this IQueryable<TQueryable> query)
{
    return (query as MongoQueryable<TQueryable>).GetMongoQuery().ToString();            
}

[Conditional("DEBUG")]
public static void DebugWriteMongoQueryText<TQuerable>(this IQueryable<TQuerable> query)
{
    Debug.WriteLine("QUERY: " + query.ToMongoQueryText());
}

I added two extension methods to IQueryable. As the return type for the query in my sample is actually an anonymous type, it wasn’t something I could easily type in the Immediate Window in Visual Studio for example.

<>f__AnonymousType2<string,string,System.DateTime>

No thanks  <>f__AnonymousType2<string,string,System.DateTime>.

I added the Conditional Attribute so that the code would only execute in a Debug build of my application.

query.DebugWriteMongoQueryText();

The output of the query above was:

QUERY: { "Deleted" : { "$ne" : true } }

As you’ll see, the anonymous projection wasn’t considered. Unfortunately, that’s a known issue with the current driver. This mattered in my above example, as some of the fields could contain a large amount of data that wasn’t needed by the current display.