Using guid id in nhibernate, index fragmentation

This is a quick consideration about using id of type GUID in nhibernate. If in SQL server the cluster index is on the id (default choiche), if you use a simple guid generator you will end in high index fragmentation. This happens because if you insert a lot of objects into the table, since the physical ordering of the records (the clustered index) is on the Id field, inserting a sequence of objects with random id will insert these object randomly into the physical space of the DB. Remember that the index is a Tree that was kept ordered by its clustered index.

Figure 1: A simple picture that shows the index organization in a database.

A simple solution is using guid.comb generator

   1: <id name="Id" column="Id" type="Guid" unsaved-value="00000000-0000-0000-0000-000000000000">

   2:    <generator class="guid.comb" />

   3:  </id>

And look at generated guid when I insert a bunch of elements in unit test.

image

Figure 2: The id sequence generated by guid.comb generator

As you can see, it is quite clear that guid are generated sequentially, but the very first part is highly different, so it seems to me that those guid are really not so similar. Another problem is that I hate guid for human readability and inserting a sequence of elements results in highly different guid. The solution is using another guid generation strategy

   1: public static class SequentialGuidGenerator 

   2: {

   3:     [DllImport("rpcrt4.dll", SetLastError = true)]

   4:     internal static extern int UuidCreateSequential(out Guid guid);

   5:  

   6:     public static Guid CreateSequentialGuid()

   7:     {

   8:         Guid guid;

   9:         UuidCreateSequential(out guid);

  10:         return guid;

  11:     }

  12:  

  13: }

This function permits me to use the UuidCreateSequential operating system function to generate a sequential guid. Now you can write another nhibernate generator.

   1: class SequentialGuidComboIdGenerator : IIdentifierGenerator

   2: {

   3:     public object Generate(NHibernate.Engine.ISessionImplementor session, object obj)

   4:     {

   5:         return SequentialGuidGenerator.CreateSequentialGuid();

   6:     }

   7: }

Now I change the id generator and run again the test.

image

Figure 3: Sequential guid generator are much more equals.

As you can verify now guid are really similar each ones, and I highlighted only the part that is different. This generator can reduce indexes fragmentation, and sequentially inserted element are really simple, because they differ only for few digit, and in my opinion this lead to a better human readability.

Clearly performance consideration are valid only when the clustered key is on the id, if clustered index is on other fields, all consideration about performance of guid should be not taken into consideration.

Alk.

Published by

Ricci Gian Maria

.Net programmer, User group and community enthusiast, programmer - aspiring architect - and guitar player :). Visual Studio ALM MVP

6 thoughts on “Using guid id in nhibernate, index fragmentation”

  1. Great article GianMaria, very interesting.

    NH has so many “hidden” gems, it’s quite difficult to manage them all.

    GG

  2. Hi,

    there is a catch: T-SQL sequential guid do not follow the specification.
    There is a higher theoretical chance to have guid conflicts, if one inserts custom guid values from the client…

  3. OMG! Why on earth use such a guid generator? Guids are used for their randomness which you totally remove this way. If you just want human readable keys in sequence then please use the hilo key generator.

    Indeed, you do not want a clustered index on a primarykey that is random so just create it on a NON primary key column. For example its creation date of maybe even something that makes sense. For example partioning your data based on customers, offices, users, documents or what ever so that your aggregate reads are fast.

  4. @liviu: this method does not use T-SQL sequential guid, it uses the UuidCreateSequential (http://msdn.microsoft.com/en-us/library/aa379322(v=vs.85).aspx) from operating system API.

    To be sure that the creation is unique, we should check the return value, and if it is RPC_S_UUID_LOCAL_ONLY we know that the uniqueness is limited only to this computer.

    Since this happens when the computer has no network card, and such a computer cannot reach the database (it is remote) this technique should be safe :) or you can simply check the return value and use the standard Guid.CreateNew().

    @Ramon: using guid guarantees me a value that is unique between all tables and all databases, and this value can be generated on multiple client without any problems. I do not care about randomness, the reason behind guid is uniqueness, and sequential guid are unique. The consideration you did on clustered index are ok, but since clustered index on the ID is the default of SQL SERVER, I see in many project that people leave as is. This is really really sad, but true. I have very few tables with Clustered index on primary key, and it only happens when there is really no valid alternative.

  5. Okey, these are unique on a specific machine. Then you still have the problem that when you create such guids on several machines (for example worker nodes, at the client side or your front-end servers) that the indexes will still get fragmented. Having a front-end machine being able to create a key upfront to correlate data between different message and data is usually my reason to work with guids.

    So, yes, your article makes sense but only when creating these keys on ONE machine so to not get fragmented indexes. Using this on several machines results in unique keys but fragmented indexes. NHibernate its guid comb version has the same issues AFAIK.

    My opinion is that you should *never* use clustered indexes on any guid columns but always based on how you want your data to be layed out on disk so how you want it to be partioned. Usually this is optimized for reads. Your guid examples and nhibernate its guid comb are optimized for writes and less fragmentation thus less heavy maintenance is required in your database optimization schedule.

  6. @ramon: you should read MSDN documentation more deeply, the UuidCreateSequential creates unique guid among all machines because it uses unique MAC address of network cards. So guid are guarantee to be unique.
    In the remote situation when the computer does not have a network card, but only in that situation, the guid are guarantee to be unique only on that machine, in that scenario the function returns the COM return value RPC_S_UUID_LOCAL_ONLY , so you can simply return a Guid.NewGuid (I’ll blog about this) :)
    I agree with you that a clustered index on Primary guid indes is indeed a bad choice, but since this is the default, I wish people to be aware of this. Using UuidCreateSequential guarantees me uniqueness, but more human readability, and less index fragmentation when developer does not change default cluster index.. thats all :).

Comments are closed.