Long numbers are truncated in MongoDb shell

Let’s try this simple code in a mongo shell:

db.TestCollection.insert({"_id" : 1, "Value" : NumberLong(636002954392732556) })
db.TestCollection.find()

What you expect is that mongo inserted one record and then that record is returned. Actually a record is inserted, but the return value can surprise you. Here is the output I got from RoboMongo

{
    "_id" : 1.0,
    "Value" : NumberLong(636002954392732544)
}

Property “Value” has not the number you inserted, the number seems to be rounded and some precision is lost, even if it is a NumberLong and 636002954392732556 is a perfectly valid Int64 number. This behavior surprised me, because I’m expecting rounding to happen only with double, not with an Int64.

Actually a double precision floating point number, that uses 64 bit for representation, is not capable of having the same precision of an Int64 number, because part of those 64 bits are used to store exponent. If you try to represent a big number like 636002954392732556 in Double Floating Point precision some rounding is going to happen. If you are not convinced, try this online converter, to convert 636002954392732556, here is the result.

In this image there is a screenshot of the online converter, that exactly demonstrate that the rounding happens due to conversion to floating point number

Figure 1: Floating point number rounding

This confirm that my problem was indeed caused by rounding because the number is somewhat converted to Floating Point format, even if I used NumberLong bson extension to specify that I want a long and not a Floating Point type.

The reason behind this is subtle. Lets try another example, just type NumberLong(636002954392732556) in a mongo shell (I used RoboMongo), and verify the result.

calling NumberLong(636002954392732556) function returns a rounded number,

Figure 2: NumberLong gots rounded directly from the shell.

This unveils the error, the number is returned surrounded with quotes, and this suggests that quotes are the problem. In javascript, every number is a double, and if you write NumberLong(636002954392732556) javascript translate this to a call to NumberLong function passing the number 636002954392732556 as argument. Since every number in javascript is a double, the number 636002954392732556 gots rounded before it is passed to NumberLong Function.

If you surround number with quotes, you are passing a string to NumberLong, in this scenario rounding does not occours and NumberLong function is perfectly capable to conver the string to number.

In mongo shell, always use quotes when you create numbers with  NumberLong

Actually this error only happens with really big numbers, but you need to be aware of this if you are creating script that uses NumberLong.

Gian Maria.

Secure your MongoDb installation

In last months a lots of rumor spreads about MongoDb and Data Leak because people found lots of MongoDb exposed on the internet without any protection.

The root of the problem is probably a bad default for MongoDb that actually starts without any autentication by default. Developers usually download mongodb, configure without authentication and access MongoDb instance without any knowledge of MongoDb security model. This simplicity of usage can lead to unsecure installation in production.

While this can be tolerable for MongoDb instances that lives in intranets, it is always not a good strategy to leave MongoDb completely unauthenticated.  It turns out that enabling a really basic authentication is really simple even in the community edition.

Once you started your MongoDb instance without authentication just connect with your tool of choice (ex robomongo) and create a user admin in the admin database.

use admin
db.createUser(
  {
    user: "admin",
    pwd: "mybeautifulpassword",
    roles: [ { role: "root", db: "admin" } ]
  }
)

Once this user is created, just stop MongoDb, change configuration to enable authentication.

security:
   authorization: enabled

If authorization is enabled in the configuration file, MongoDb requires that all of your connection to the server is authenticated. There is a nice tutorial in MongoDb site, but basically once authorization is enabled you can authenticate on a single database or to the admin db. With the above instruction I’ve created a user admin on the admin database with the role root. This is the minimum level of authentication you should have, a single user that can do anything. 

This configuration is far to be really secure, but at least avoid to access MongoDb instance without password. It is equivalent to enable only the user “sa” on a Sql Server.

The next step is changing your connection string inside your sofware to specify user and password. The format of the url is this:

mongodb://user:password@localhost/newDb?authSource=admin

As for native authentication in Sql Server, username and password are stored in connection string, and pay attention to the authSource parameter of the connection string. If you omit that parameter C# driver try to authenticate against specified database (newDb in this example) and it fails because the only user is in the admin database. Thanks to the authSource parameter you are able to specify the database to use to authenticate.

You don’t need to change anything else in your code, because the connectionstring contains all the information to authenticate the connection.

To avoid having unsecure instance of mongoDb in production, starts immediately to secure database directly during developing phase, so every person included in the process knows that he need a password to access the database.

Gian Maria.

Misusing an ORM

I’ve blogged some time ago that I’m starting to consider ORM an Antipattern, and recently Mr Fowler posted similar thoughts in his bliki, moreover I have the pleasure to be one of the organizer of the first RavenDB official Course in Italy, with my dear friend Mauro as teacher.

Since I’m strongly convinced that in a full OOP approach to problem objects should not have nor setter nor getter, most of the work and complexities of an ORM is simply not needed, because you usually retrieve objects from the storage with only one function GetById and nothing else. In my long experience with NHibernate, I verified that most of the problem arise when you need to show data in UI in specific format and you start to write complex Query in HQL or ICRiteria or LINQ, then you need to spend time with NHProfiler to understand if the queries are good enough to run on production system and when objects changes a little bit you need to rewrite a lot of code to suite the new Object Model. This last point is the real pain point in DDD, where you usually should create Object Model that will be manipulated a lot before reaching a good point, after all the main value of DDD approach is being able to create a dialog with a DOMAIN EXPERT and it is impossible to find a good Object Models at the first tentative. If refactoring a model become painful, you are not allowed to modify it with easy, you are going away from DDD approach.

This is where CQRS can help you, for all objects belonging to the domain you need only to Save, LoadById, Update and delete, because every read model should be defined somewhere else. In such a scenario an ORM is really useful, because if you need to store objects inside Relational Database you can leave the ORM all the work to satisfy the CRUD part, where the R is the method GetById. To start easily with this approach you can create SQL View or stored procedures for all the Read Models you need; this imply that whenever the structure of the Domain Model changes, you need only to change all affected Read Models, some view and some stored procedure, but you have no need to refactor the code.

In this situation the ORM can really helps you, because if you change the Domain Model, you should only change the mapping, or let some Mapping by convention do this for you (ConfORM for NH is an example), regenerate the database and update only affected Read Models. If your domain is really anemic, if you expose properties from objects, even only with getters, whenever you change a domain class you should answer the question “If I change this property, what other domain objects will be affected? How many service class will be affected? How many query issued from Views will be affected?”. If you are not able to create a Read Model with SQL View or stored procedure, you can write a denormalizer that listens for DOMAIN EVENTS and populate the Read Model accordingly. In my opinion this is the scenario where an ORM can really helps you.

In such a situation a NoSql database can dramatically simplify your life, because you do not need an ORM anymore, cause you are able to save object graps into the storage directly, and you can create Read Models with Map/Reduce or with denormalizers.

But sadly enough, ORM are primarily used to avoid writing SQL and persist completely anemic domain, where all the logic reside on services. In such a scenario it is easy to abuse an ORM and probably in the long term the ORM could become much more a pain than a real help.

Gian Maria.

Logging object with circular reference with Mongo Appender chrashes your process

I’ve blogged some days ago on the possibility to save log4net logs inside a Mongo database, but you should be aware that this technique can be dangerous if your objects have circular references. A circular reference happens when object A reference object B and object B directly or indirectly reference object A again and this is a high risk when you work with Mongo Serializer.

Mongo Serializer does not likes circular references (it is perfectly acceptable, because documents with circular references cannot be saved into a document database), but the problem is: if you try to serialize an object that has a circular reference you will get a StackOverflowException and your process will crash, as stated in official documentation from MSDN

Starting with the .NET Framework version 2.0, a StackOverflowException object cannot be caught by a try-catch block and the corresponding process is terminated by default. Consequently, users are advised to write their code to detect and prevent a stack overflow.

If you remember how I modified MongoDb log4net appender, I decided to save into MongoDB complex objects with this code:

if (compositeProperties != null && compositeProperties.Count > 0)
{
    var properties = new BsonDocument();
    foreach (DictionaryEntry entry in compositeProperties)
    {
        BsonValue value;
        if (!BsonTypeMapper.TryMapToBsonValue(entry.Value, out value))
        {
            properties[entry.Key.ToString()] = entry.Value.ToBsonDocument();
        }
        else
        {
            properties[entry.Key.ToString()] = value;
        }
    }
    toReturn["customproperties"] = properties;
}

The key point is in entry.Value.ToBsonDocument(), because if someone store in log4Net global context an object that contains a circular reference, your program will be terminated the next call to log4net, because the StackOverflowException could not be caught.

This is especially annoying when you want to store in your log object that comes from Database with an ORM like NHibernate, because every object that has a Bag reference, usually get hydrated with a PersistentBag, an internal class by nhibernate, that has a circular reference. A simple solution to this process is telling MONGO which serializer to use for such a specific types.

The technique is simple, Mongo drivers provide the ability to register custom Serialization provider easily

BsonSerializer.RegisterSerializationProvider(new LoggerBsonSerializerProvider());

And this is the code of the class that implements the ISerializationProvider interface

public class LoggerBsonSerializerProvider : IBsonSerializationProvider
{
    public IBsonSerializer GetSerializer(Type type)
    {
        if (type.FullName.Contains("Nhibernate", StringComparison.OrdinalIgnoreCase))
        {
               return BsonNullSerializer.Instance;
        }
        return null;
    }
}

The only function you need to implement is the GetSerializer, in this simple example, for all types that contains NHibernate string in it, simply return a BsonNullSerializer. That basically tells Mongo Serializer to ignore that types. This is in my opinion the best approach because it avoids the risk of serializing NHibernate internal classes that actually can throw a StackOverflowException. If you want to serialize NHibernate PersistentGenericBag but you do not want to risk a circular reference you can use this code instead.

public class LoggerBsonSerializerProvider : IBsonSerializationProvider
{
    public IBsonSerializer GetSerializer(Type type)
    {
        if (type.FullName.Contains("Nhibernate", StringComparison.OrdinalIgnoreCase))
        {
            if (type.TypeImplementsGenericInterface(typeof(IList<>)))
            {
                    return EnumerableSerializer.Instance;
            }
            return BsonNullSerializer.Instance;
        }
        return null;
    }
}

public static Boolean TypeImplementsGenericInterface(this Type typeToCheck, Type interfaceToLookFor)
{
	return typeToCheck.GetInterface(interfaceToLookFor.Name) != null;
}

The main difference is:, for each NHibernate internal type that implement a generic IList<> I tell Mongo to serialize using the EnumerableSerializer, this kind of serializer avoid the circular reference problem, because the PersistentGenericBag is handled as a IList<> ignoring its real properties. This approach is still not safe, because you need to be sure that the collection was already loaded from database or the object is not detached, to avoid an exception during logging because the collection cannot be initialized. This type of exception is catchable, so it can be a minor issue because you can handle it with a simple try catch inside the Mongo Appender.

Gian Maria

Custom XML Serialization

Another advantage of storing properties of entities into a state object based on a Dictionary, is the ability to easily serialize objects in custom formats. As an example I create an XML serializer that is capable to serialize an entity in a custom XML format.

I used this simple serializer to create a NHibernate User Type that permits me to save a child entity in a single XML column of SQL Server, a feature useful when you need to save objects which schema changes quite often and you do not want to keep database schema updated, or you need to store dynamic data into the DB. I now that all of you are screaming “USE NO SQL DB”, like Raven, but it is not simple to introduce new technologies into existing projects, and only to justify the need to save the 2% of objects.

Thanks to the custom serializer and the ability to do DeepCloning, writing such a User Type is really simple. First of all, I’ve implemented a method called EquivalentTo that permits me to compare two entities based on their state, this makes trivial writing the Equals Method of the UserType

bool IUserType.Equals(object x, object y)

{

    if (ReferenceEquals(x, y)) return true;

    if (x == null || y == null) return false;

 

    if (x is BaseEntity)

        return ((BaseEntity)x).IsEquivalentTo(y as BaseEntity);

 

    return x.Equals(y);

}

Same consideration for the IUserType.DeepCopy method, based on the Clone method of the base class. Saving and loading the object is just a matter of using the BaseEntity serialization methods.

public object NullSafeGet(System.Data.IDataReader rs, string[] names, object owner)

{

    Int32 index = rs.GetOrdinal(names[0]);

    if (rs.IsDBNull(index))

    {

        return null;

    }

    String databaseContent = (String)rs[index];

    return Deserialize(databaseContent);

 

}

 

internal Object Deserialize(String content)

{

    XElement element =  XElement.Parse(content);

    return element.DeserializeToBaseEntity();

}

 

public void NullSafeSet(System.Data.IDbCommand cmd, object value, int index)

{

    if (value == null || value == DBNull.Value)

    {

        NHibernateUtil.String.NullSafeSet(cmd, null, index);

    }

    else

    {

        NHibernateUtil.String.Set(cmd, Serialize(value), index);

    }

 

}

 

internal String Serialize(Object obj)

{

    if (!(obj is BaseEntity))

        throw new ArgumentException("Only BaseEntity based entities could be serialized with this usertype", "obj");

    return (obj as BaseEntity).SerializeToXml().ToString();

}

The advantage of this approach, is that I have another base entity class called BaseExpandoEntity that permits to store into state object property by name, it is like a dynamic object, but I used it in .NET 3.5 where dynamics still does not exists. This kind of entity is clearly not really OOP, it is not well encapsulated, because you can set property of any name from external code and it is used mainly as a DataClass, just to store information in database without the need to be schema or class bounded. Now suppose to have a class called Father that has a property of type SpecificNotes, based on this BaseExpandoEntity and saved in database with the above User Type, you can write this code.

Father d1 = new Father () {...};

d1.SpecificNotes.SetProperty("Ciao", "mondo");

d1.SpecificNotes.SetProperty("Age", 80);

Repository.Father.Save(d1);

In the above code, the Father class has a property called SpecificNotes mapped as XML in database, I can store two new properties with the SetProperty and this is what is saved to database.

INSERT INTO Father

            (Id,

             xxx,

             yyy,

             SpecificNotes)

VALUES      (1,

             ...,

             ...,

             <SpecificNotes fname="Myproject.Entities.SpecificNotes, Myproject.Entities" ><Ciao>mondo</Ciao>

  <Age type="System.Int32">80</Age></SpecificNotes>' )

Now suppose you want to retrieve from the database all Father objects that have a SpecificNote with a  property named Ciao with value ‘mondo’, how could you issue the query since the object is stored as XML? The solution is creating a custom Criteria Operator. I do not want to bother you with the details, but the key method is something like this

public override NHibernate.SqlCommand.SqlString ToSqlString(

    NHibernate.ICriteria criteria, 

    ICriteriaQuery criteriaQuery, 

    IDictionary<string, NHibernate.IFilter> enabledFilters)

{

    string objectPropertyColumnName = criteriaQuery

        .GetColumnsUsingProjection(criteria, PropertyName)[0];

 

    StringBuilder criteriaText = new StringBuilder();

    criteriaText.Append(objectPropertyColumnName);

    criteriaText.Append(".value('(/*/");

    criteriaText.Append(XmlCriterionParameters.ChildPropertyName);

    criteriaText.Append(")[1]', 'nvarchar(max)')");

    switch (XmlCriterionParameters.CriteriaOperator)

    {

        case CriteriaOperator.Equal:

            criteriaText.Append("=");

            break;

        case CriteriaOperator.Like:

            criteriaText.Append(" like ");

            break;

        default :

            throw new NotSupportedException("Still not supported operator");

    }

 

    criteriaText.AppendFormat("'{0}'", Value);

    return new SqlString(criteriaText.ToString());

 

}

In this example I’ve implemented only “=” and “like” operators, but it is enough for this sample. Now I can issue the following query.

Query query = Query.CreateXml("SpecificNotes", "Ciao", CriteriaOperator.Equal, "mondo");

var result = Repository.Father.GetByCriteria(query);

This snippet used a Query Model to specify the query, but as you can see the important aspect is that I’m able to create an XML criterion on the “SpecificNotes” sub object, where the expando property “Ciao” is Equal to the value “mondo”. Here is the query that was issued to the DB

SELECT ...

FROM   Father this_

WHERE  this_.SpecificNotes.value('(/*/Ciao)[1]', 'nvarchar(max)') = 'mondo'

This is quite a primitive query, because of the /*/ that basically mean I’m searching for a property Ciao contained in any subobject, but you can modify your Custom Criterion to adapt the XPath to your need; the important aspect is that I’m now able to store and retrieve transparently my expando objects from a standard SQL server thanks to the great flexibility of NHibernate.

Gian Maria.