Bulk insert in MongoDb with C# driver

There are situation where you need to save a lot of documents inside a collection in MongoDb. My scenario is a migration of documents from a collection to another database, with in-memory manipulation of the documents.

The most common error in these situation is to read the documents from the original collection, then execute a function that modify the document in-memory, and finally issuing an insert in destination collection. This is wrong because you have a roundrip against MongoDb for each document you are saving.

Whenever you are calling Insert or Save function, you are paying the penality of a call to MongoDb process, network latency, etc, whenever possible you should reduce the number of calls to database engine.

In such a scenario MongoDb driver has a function called InsertBatch that allows you to insert document in batches and the fun part is that it simply accepts an IEnumerable. As an example, I have a function that manipulate a BsonDocument stored in a variable called Action, I have source and dest database where I need to copy documents with manipulation and this is the code that does everything.

 var sourceQueue = source.GetCollection(queue);
var destQueue = dest.GetCollection(queue);

if (sourceQueue.Count() == 0) return;

//migrate counterCollection
Console.WriteLine("Migrating Queue " + queue);
var allElement = sourceQueue
	.FindAll()
	.AsEnumerable()
	.Select(document => {
		Action(document);
		return document;
	});
destQueue.InsertBatch(allElement);

The name of the collection is contained in queue variable (actually I’m transforming a software that manage jobs), and as you can verify I can simply enumerate all source documents with FindAll (this code uses old 1.10 driver), for each object I’m calling the Action function that manipulate the document, and finally I can simply use the InsertBatch to insert documents in batches.

This function runs really faster than saving each document with a separate call, event if the MongoDb instance runs on the very same machine, so you do not pay the network latency.

If you use latest version of the drivers, you have the InsertMany method that offers even more options and basically does the very same operation than InsertBatch.

Gian Maria.

Long numbers are truncated in MongoDb shell

Let’s try this simple code in a mongo shell:

db.TestCollection.insert({"_id" : 1, "Value" : NumberLong(636002954392732556) })
db.TestCollection.find()

What you expect is that mongo inserted one record and then that record is returned. Actually a record is inserted, but the return value can surprise you. Here is the output I got from RoboMongo

{
    "_id" : 1.0,
    "Value" : NumberLong(636002954392732544)
}

Property “Value” has not the number you inserted, the number seems to be rounded and some precision is lost, even if it is a NumberLong and 636002954392732556 is a perfectly valid Int64 number. This behavior surprised me, because I’m expecting rounding to happen only with double, not with an Int64.

Actually a double precision floating point number, that uses 64 bit for representation, is not capable of having the same precision of an Int64 number, because part of those 64 bits are used to store exponent. If you try to represent a big number like 636002954392732556 in Double Floating Point precision some rounding is going to happen. If you are not convinced, try this online converter, to convert 636002954392732556, here is the result.

In this image there is a screenshot of the online converter, that exactly demonstrate that the rounding happens due to conversion to floating point number

Figure 1: Floating point number rounding

This confirm that my problem was indeed caused by rounding because the number is somewhat converted to Floating Point format, even if I used NumberLong bson extension to specify that I want a long and not a Floating Point type.

The reason behind this is subtle. Lets try another example, just type NumberLong(636002954392732556) in a mongo shell (I used RoboMongo), and verify the result.

calling NumberLong(636002954392732556) function returns a rounded number,

Figure 2: NumberLong gots rounded directly from the shell.

This unveils the error, the number is returned surrounded with quotes, and this suggests that quotes are the problem. In javascript, every number is a double, and if you write NumberLong(636002954392732556) javascript translate this to a call to NumberLong function passing the number 636002954392732556 as argument. Since every number in javascript is a double, the number 636002954392732556 gots rounded before it is passed to NumberLong Function.

If you surround number with quotes, you are passing a string to NumberLong, in this scenario rounding does not occours and NumberLong function is perfectly capable to conver the string to number.

In mongo shell, always use quotes when you create numbers with  NumberLong

Actually this error only happens with really big numbers, but you need to be aware of this if you are creating script that uses NumberLong.

Gian Maria.

Grant right to use $eval on Mongodb 3.2

One of the side effect of enabling authorization on MongDb is that, even if you create a user with “root” right, this account is not able to execute the $eval command. The simpthom is, when you try to execute $eval you got this error

mongodb Command '$eval' failed: not authorized on jarvis-framework-saga-test to execute command

This happens because $eval is somewhat deprecated, and it should not be used. Since it is a dangerous command, a user should have access to all action on all resources, and you need to create a role that has anyAction on anyResource.

If you really need to use $eval, you should create a role, just connect to the admin database and create a new role with the command.

db.createRole( 
	{ 
		role: "executeEval", 
		privileges: [ { 
			resource: { anyResource: true }, 
			actions: [ "anyAction" ] } ], 
		roles: []
 } ) 

Now that you have this new role, just add to all the users that need to use $eval, as an example, if you have a single admin user in admin database, just run this against the admin db.

db.grantRolesToUser("admin", [ { role: "executeFunctions", db: "admin" } ])

And now the admin user can execute $eval against all databases.

Gian Maria.

Secure your MongoDb installation

In last months a lots of rumor spreads about MongoDb and Data Leak because people found lots of MongoDb exposed on the internet without any protection.

The root of the problem is probably a bad default for MongoDb that actually starts without any autentication by default. Developers usually download mongodb, configure without authentication and access MongoDb instance without any knowledge of MongoDb security model. This simplicity of usage can lead to unsecure installation in production.

While this can be tolerable for MongoDb instances that lives in intranets, it is always not a good strategy to leave MongoDb completely unauthenticated.  It turns out that enabling a really basic authentication is really simple even in the community edition.

Once you started your MongoDb instance without authentication just connect with your tool of choice (ex robomongo) and create a user admin in the admin database.

use admin
db.createUser(
  {
    user: "admin",
    pwd: "mybeautifulpassword",
    roles: [ { role: "root", db: "admin" } ]
  }
)

Once this user is created, just stop MongoDb, change configuration to enable authentication.

security:
   authorization: enabled

If authorization is enabled in the configuration file, MongoDb requires that all of your connection to the server is authenticated. There is a nice tutorial in MongoDb site, but basically once authorization is enabled you can authenticate on a single database or to the admin db. With the above instruction I’ve created a user admin on the admin database with the role root. This is the minimum level of authentication you should have, a single user that can do anything. 

This configuration is far to be really secure, but at least avoid to access MongoDb instance without password. It is equivalent to enable only the user “sa” on a Sql Server.

The next step is changing your connection string inside your sofware to specify user and password. The format of the url is this:

mongodb://user:password@localhost/newDb?authSource=admin

As for native authentication in Sql Server, username and password are stored in connection string, and pay attention to the authSource parameter of the connection string. If you omit that parameter C# driver try to authenticate against specified database (newDb in this example) and it fails because the only user is in the admin database. Thanks to the authSource parameter you are able to specify the database to use to authenticate.

You don’t need to change anything else in your code, because the connectionstring contains all the information to authenticate the connection.

To avoid having unsecure instance of mongoDb in production, starts immediately to secure database directly during developing phase, so every person included in the process knows that he need a password to access the database.

Gian Maria.

“Unsupported filter” using ContainsAny in Mongo 2.x driver

Porting code from Legacy driver to new driver syntax is quite annoying for .NET MongoDb driver. In the new Drivers almost everything is changed, and unless you want to still use old legacy syntax creating a mess of new and old syntax, you should convert all the code to the new syntax.

One of the annoying problem is ContainsAny in LINQ compatibility driver. In old drivers, if you have an object that contains an array of strings, and you want to filter for objects that have at least one of the value contained in a list of allowed values you had to resort to this syntax.

  return Containers.AllUnsorted
                .Any(c => c.PathId.Contains(containerIdString) &&
                c.Aces.ContainsAny(aceList));

In this situation Aces properties is an HashSet<String> and aceList is a simple String[], the last part of the query uses the ContainsAny extension method from Legacy MongoDb driver. That extension was needed in the past because the old driver has no full support for LINQ Any syntax.

The problem that arise with the new driver is, after migrating code, the above code still compiles because it references the Legacy Drivers, but it throws an “Unsupported Filter” during execution. The solution is really simple, the new driver now support the whole LINQ Any syntax, so you should write:

return Containers.AllUnsorted
	 .Any(c =&gt; c.PathId.Contains(containerIdString) &amp;&amp;
			c.Aces.Any(a =&gt; aceList.Contains(a)));

As you can see, you can now write the Query with standard LINQ syntax without the need to resort to ContainsAny.

While I really appreciate that in the new Drivers LINQ support is improved, it is quite annoying that the old code still compiles but it throws at run-time.

Gian Maria.