In a previous post I explained how to create a simple LINQ extension method to use the Distinct() method with a lambda and not an IEqualityComparer. That solution uses a simple List<T> as temporary storage to implement the concept of distinct elements. To verify if an object was already returned I simply searched into the temp List<T> looking if an element that satisfy the Predicate, this will permit me to return only distinct elements.
The question is, it is slow? The answer is yes, because List<T> performs really bad when you search elements in it, because it needs to scan all the elements until it find an element that satisfy the predicate or you finish all the elements.
I created a list with 320 elements and benchmark the time needed to execute 2000 times an iteration with the Distinct() method I wrote.
It executes in about 3800 milliseconds, but is it slow? is it fast? can we optimize it?
The solution is running this simple test under a profiler, like ANT profiler, just to see if there is some Hot path
From this image it is clear that 98% of the time is spent inside the Exists() method of the List<T>, and this is something we already know, but the profiler is confirming our suspects, searching elements inside a simple IList<T> is really slow. A simple solution is writing this helper to create an IEqualityComparer with lambda.
This is a class that implements IComparer and IEqualityComparer based on lambda. IEqualityComparer<T> is a special interface, because it has the GetHasCode function that permits to use Hashtable to store objects and use fast search algorithm. Thanks to a simple helper factory, you can create an IEqualityComparer<T> based on a lambda with a single instruction.
Thanks to the Factory class I can create a Comparer<T> based on two functions, one implements the concepts of equality, the other return an hashcode for an object. The general rule is that two object that are equals must return the same hashcode. Since I want to compare instance only by BlaBla property, I implement the hash function simply using BlaBla.GetHashCode(). Now I can use the base Distinct() LINQ function.
If you compare this code with the one that use the lambda it is really similar, in the end all you want is to be able to use the Distinct operator with a Lambda function that implements the concept of Equality; in the first sample you can pass the lambda directly to the distinct() operator, in the second example you need to write some extra code to create the comparer passing the lambda object and an HashCode function, but the result is the same: being able to use Distinct() method with a Lambda.
If you run this second snippet, the time of execution drops from 3800 ms to 90ms, because the base implementation of Distinct, (thanks to IEqualityComparer) can use some hash based container to store objects and perform faster searches.