Why I’m not a great fan of LINQ query for MongoDb

I’m not a great fan of LINQ provider in Mongo, because I think that developers that start using only LINQ misses the best part of working with a Document Database. The usual risk is: developer always resort to LINQ queries to load-modify-save a document instead of using all powerful update operators available in Mongo.

Despite this consideration, if you need to retrieve full document content, sometimes writing a LINQ query is the simplest approach, but, as always, not every valid LINQ statement you can write can be translated to MongoQuery. This is the situation of this query.

//apply security filtering.
documentsQuery = documentsQuery
  .Where(d => d.Aces.Any(a => permittingAces.Contains(a)))
  .Where(d => !d.Aces.Any(a => denyingAces.Contains(a)));

I need to filter all documents, finding documents where Aces property (is a simple HashSet<String>) contains at least one of the aces in permittingAces list but should not contain any aces listed in denyingAces collection. While this is a perfectly valid LINQ query, if you try to issue it to Mongo you got a:

Any is only support for items that serialize into documents. The current serializer is StringSerializer and must implement IBsonDocumentSerializer for participation in Any queries.

You can use Any with sub-objects, but expressing an Any condition on an array of string is not supported. To overcome this limitation, .NET provider for MongDb provide a convenient ContainsAny extension operator to write previous query.

documentsQuery = documentsQuery
  .Where(d =&gt; d.Aces.ContainsAny(permittingAces))
  .Where(d =&gt; !d.Aces.ContainsAny(denyingAces));

This LINQ query works perfectly, and if you are curious how this query translated to standard MongoQuery, you can use the GetMongoQuery() method, as I’ve described in previous post.

This simple example shows you some of the limitation that you can encounter using LINQ provider in MongoDb, and my suggestion is to always prefer using standard MongoQuery because it gives you lots of more flexibility, especially for update operations.

Another reason in the past to stay away from the LINQ provider is that the older version of the driver, still used by large amount of persons, had a really bad implementation of the Select LINQ operator, because the projection is done client side, as stated here:


Select does not result in fewer fields being returned from the server. The entire document is pulled back and passed to the native Select method. Therefore, the projection is performed client side.

This is a great problem, because the whole document is always returned from the server, using more bandwidth and more resource server side. Remember that one of the standard optimization when you issue query to MongoDb instance is reducing the amount of field you are loading from your document. If you use old LINQ provider and you are doing Select to retrieve less field from the server, you are wasting your time, because you are loading always the whole document.

Gian Maria.

Start ElasticSearch in windows with a different configuration file

When you start elasticsearch double clicking on Elasticsearch.bat in windows, it uses the standard config/elasticsearch.yml files that is contained in the installation directory. Especially for development, it is really useful to be able to start ES with different configuration file.

Probably my googleFu is not perfect, but each time that I need to find the correct option to pass to Elasticsearch.bat batch file I’m not able to find with the first search and I always loose some time, and this means that probably this information is not indexed perfectly.

If you are interested the configuration option is called –Des.config and permits you to specify the config file used to start your ES Node.

elasticsearch.bat -Des.config=Z:\xxxx\config\elasticsearch1.yml

You can now create how many config file you need, and simply create multiple link to the original bat file with different config file to start ES with your preferred options.

Gian Maria.

Mongo compression with Wired Tiger Engine

With 3.0 version of Mongo database the most welcomed feature was the introduction of pluggable storage engine. This imply that we are not forced to use standard NMAPv1 storage system, but we can use other way of storing data on our filesystem. The first and official alternative storage system is Wired Tiger.

One of the most interesting aspect of Wired Tiger is Data Compression, a feature that can reduce the space of your database on disk, and that is especially effective since Mongo stores document as BSON, where most of the data is text. Wired Tiger has three options for compression: none, snappy and zlib, bug even with none compression, the space occupied by your database on disk is usually lower than NMAPv1. Here is a simple and quick test done on a customer database.

  • NNMAPv1: 3.250.453KB
  • WiredTiger no compression: 1.219.696 KB
  • WiredTiger snappy: 603.674 KB
  • WiredTiger zlib: 466.548 KB

This particular database is full of text and this explain why Wired Tiger is so superior respect space occupied by the database, but the gain is really impressive. The version with Snappy compression is only a fraction of the database with NNMAPv1 and with lesser disk space occupied, there is less disk I/O activity to read data. The further gain you obtain with zlib comes at more CPU usage, and you need to measure to understand if it worth in your deployment.

The major drawback of using Wired Tiger engine is that RoboMongo, one of the most interesting UI to access Mongo, does not work because it still uses old version of the shell and that there is no automatic migration from NMAPv1 to Wired Tiger (you need to do a backup, then change storage system, and restore).

Gian Maria

Mixing native query and LINQ in Mongo Query

Lets look at the following query issued to a standard MongoCollection<T> instance object:

return _bufferCollection.Find(
        GetNextBlockQuery(lastTick, lastRevisionId))
    .OrderBy(d => d.LastUpdated)
    .ThenBy(d => d.RevisionIdNumeric);

The method GetNextBlockQuery simply return a Query<T> query object expressed with C# mongo query syntax. In this query the result of Find() method is simply sorted using standard LINQ syntax.

Do you spot where the problem is?

Find() method returns an object of type MongoCursor<T> that implements IEnumerable<T> but not IQueryable<T>.

If you query MongoCollection with LINQ using the AsQueryable() extension method, there is no problem using OrderBy() or ThenBy() LINQ extension methods. In this situation the implementation of IQueryable inside Mongo C# driver will translate everything to standard mongo query syntax, then it executes translated query to the server and returns objects to the caller.

In previous example instead, the OrderBy() LINQ operator is invoked against a MongoCursor and ordering will be done in memory. The problem is: OrderBy method will operate against IEnumerable object and iterates all the objects to return them in correct order.

If you use LINQ operators against standard MongoCursor, it will operates in memory, hurting performances.

This will hurt performances of the application: each time the query is executed, the entire resultset is loaded into memory and then sorted. To avoid this problem, you need not to mix native Mongo C# query with LINQ operators. The correct query is the following one:

 return _bufferCollection.Find(
      GetNextBlockQuery(lastTick, lastRevisionId))
           .Ascending(d => d.LastUpdated, d => d.RevisionIdNumeric));

This new version uses SetSortOrder() method of Mongo C# Query, so it will be sorted directly from Mongo server and objects will be loaded in memory during standard for-each enumeration. The above problem is really bad if you want to limit number of returned objects. If you use a Take(50) method to obtain only 50 objects, actually you are loading the entire collection into memory, then returning the first 50 elements. This is really different from asking mongo to return only 50 elements directly in the query.

One of the greatest problem is that if you limit number of record with LINQ operator Take()  on the first query, yoy are doing Client Side pagination, with significant performance loss.

As general rule, avoid mixing LINQ and Mongo query classes to issue query to your Mongo server, and prefer native Query syntax over LINQ because it will offer you the whole capabilities of Mongo. LINQ query at the contrary will expose only a subset of possible queries, and beware that Select operator operates in memory, instead of limiting the number of returned field directly from the server.

Gian Maria.

Upgrading ReplicaSet with MMS

My RS of three mongo instances is running mongo 2.6.6 and now I want to upgrade to the latest version. Thanks to mms I can simply start the upgrade process directly from a web page, without needing to have an access to my real servers. The real good stuff is that the upgrade process is completely managed by mms for me, and the upgrade is done without stopping my Replica Set.

Since this is a test/dev environment running in my home server, my bandwidth is not so high and it is not unfrequent that one of the node finished downloading latest mongo version before the others. The nice stuff is that mms takes care of everything, and starts upgrade my secondary instances.


Figure 1: One of the server is upgrading, while the others are still running old version.

The super-nice feature is that one of the server upgraded to the new version, the others still are running the old version, and the Replica-Set version is mixed because I have node with different versions. I can connect to it with robomongo and workd as usual.  At a certain moment the primary node is upgraded, now we are in a state where we have only secondary nodes, in this moment users cannot write to Replica Set:


Figure 2: Status is 2, indicating that that a member in secondary is replicating, now all node are SECONDARY

As soon as the primary node starts the new mongo process with the new version, replica set is fully operative again. The downtime is really small and you upgraded everything with few clicks, letting mms agents take care of everything.

Gian Maria.