NoSql is not a replacemente for SQL databases, but it is a valid alternative for a lot of situations where standard SQL is not the best approach to store your data. Since we were taught that whenever you need to store data on a “data store” and you need to query that data for retrieval, SQL is the best solution, you have only to decide what Sql Engine to use and the game is done.
In 2012 this sentence has proven wrong, what I mean is that is not possible to assume anymore that SQL is the “only way to go” to store data; but you should be aware that other alternative exists and it is called NO SQL. Under this term we have various storage engine that are not based on SQL and in .NET we have an exceptional product called RavenDB (you can find a really good introduction to RavenDb in Mauro’s Blog).
The first big difference with standard Sql is being Schemaless. One of the most annoying restriction of Sql Server is the need to specify exactly the format of the data you want to store inside your storage. This is needed for a lot of good reason, but there are situation when you really does not care about it, especially if your software is heavily based on OOP concepts. Suppose you have this object
1: class Player
2: {
3: public String Name { get; set; }
4:
5: public DateTime RegistrationDate { get; set; }
6:
7: public Int32 Age { get; set; }
8: }
For a moment do not care about the fact that this object is not well encapsulated (it has public getter and setter) but focus only on the need to “store” this object somewhere. If you use a standard Sql storage, first of all you need to create a table, then define columns, decide maximum length for the Name column and finally decide an ORM to use or build a dedicate data layer and finally you can save the object.
If you work with raven, this is the only code you need
1: var store = new DocumentStore { Url = "http://localhost:8080" };
2: store.Initialize();
3: using (var session = store.OpenSession())
4: {
5: var player = new Player
6: {
7: Age = 30,
8: RegistrationDate = DateTime.Now,
9: Name = "Alkampfer",
10: };
11: session.Store(player);
12: session.SaveChanges();
13: }
I simply created a DocumentStore based on a local server, opened a session and saved an object, I did not defined anything on the server, I did not need to have an ORM, the server simply takes the object and save it, period!
I liked very much this approach because
I needed to save an object to a data storage and everything I need is just a two function all, Store to tell the storage the object I want to save and SaveChanges that actually do the save.
What I got with this simple snippet of code? Just browse with a standard browser to the address of the server and you should see the content of the database.
Figure 1: Content of the database after insertion of a simple object
From Figure 1 you can see content of the raven database, it contains a player and the little 1 beside the object is the Id that Raven uses internally to uniquely identify that object. The other object called Sys Doc Hilo/players takes care of id generation for Players object with an Hilo algorithm.
That’s all folks, no need to define schema, no need to have special Id property or any other requirement to make the object compatible with the store, just call Store method on whatever .NET object and your object is inside the database, Period!.
This is only a scratch of the many functionalities of RavenDb
, more to come in my blog and in Mauro’s one.
Gian Maria.
Tags: RavenDB
I’ve a stored procedure with a query that runs on a quite big database, it was slow (more than one minute to run) and was optimized using a temp table. The result is that execution time dropped to ~2 secs, and since this was acceptable the optimization stopped.
After a couple of months, the query become really slow again, it got executed in ~30 secs and I started to investigate why.
A quick tour with SSMS and STATISTICS I/O did not reveal some news, but the execution plan have a strange sort operation that takes 90% of the time of the overall query and this is really strange, but I did not find any reason why it should slow the query so much. To have a better picture of what is happening I decided to fire the Activity Monitor to check if the query stops for any lock in table, but I found that the task that is executing the query goes into heavy parallelism (you see a lot of row in the activity monitor with the same id), but each subtask is waiting a lot with a CXPACKET wait type and everything seems stuck. CXPACKET wait time happens when the execution of the query is parallelized so I decided to disable parallelism of the query with OPTION (MAXDOP 1) to verify if the situation change. The result is that the query now executed in ~2 secs, like it did two months before, so I decided to leave it with parallelism disabled and it start to run just fine.
The conclusion is,: when database is big, and query are complex, it is not so simple to understand why a query is slow, sometimes you need to tweak how SQL Server issue the query with Query Hint and you should use all the tools you have to understand what is really happening
.
Gian Maria.
Tags: Sql Server
In a production Sql Server database we had some issue with the hardware, the result is that one very big database started to gave us errors on DBCC CHECKDB, the error is the following one.
Msg 8929, Level 16, State 1, Line 1
Object ID xxxxxx, index ID 1, partition ID xxxxxx, alloc unit ID xxxxx (type In-row data): Errors found in off-row data with ID xxxxxxx owned by data record identified by RID = (3:34252:4)
This happens in one of the most central and important table of the database, that contains millions of record, and only 20 rows gave us error, so I’d like to verify witch rows had the error, because in that table there are lots of records in logical status “deleted” and the corresponding physical row can be removed from the database without much pain and reinserted with the very same data to avoid dataloss.
So my question is… how can I identify the row given the RID (Row Identifier?) First of all from Object ID you can easily find the name of the table with select Object_Name(<Object_ID>) but what about the RID. After a little search I find the sys.fn_physLocFormatter function and I created this query
1: SELECT sys.fn_PhysLocFormatter (%%physloc%%) AS RID,
2: *
3: into debug.RIDTABLE
4: FROM TableName
I selected into a table in Debug schema the RID of the Row as well as all the other columns of the table with error, now I take the output of the DBCC CHECKDB, selected all the RID of rows with errors and I could execute the query
1: select * from debug.RIDTABLE
2: where RID IN ('(3:34252:4)', '(x:xxxxx:x)'.....
To extract from the RIDTAble all the row with errors, then I can simply delete them from the original table, and insert them again from the RIDTABLE. Checking the data inside RIDTABLE it seems that the data is correct, except for a NVARCHAR(MAX) field that contains long string of text.
After the process DBCC CHECKDB did not give any other error, so I was quite happy to solve this problem without the need to put the database in single-user-mode, but the most important aspect is that I was able to identify the rows with error, to understand the impact of errors in the software. It turned out that only a couple of lines contained really important data, the others are in Deleted status, so we did not lost anything so important.
Gian Maria.
Tags: Sql Server
If you work with Sql Server and like Red-Gate tools, you can find a couple of interesting webinars to see some features of their exceptional tools in action. If you are interested you can find all the details here; webinars are free and require only a registration, so go and take a look at what Red Gate can offer
.
Gian Maria.
Tags: RedGate
I’ve more to say about the Traffic Light example I explained in a previous post, if you look at that post, you can notice that the Domain is composed by properties that have only getter and not setters. This needs is explained in this post, where I explain why you should protect the status of your entities from external direct manipulation.
If I wish to use my simple Traffic-Light example in a class to teach OOP principle, it is not good enough, because it breaks the concept of Encapsulation. As stated in Wikipedia
encapsulation means that the internal representation of an object is generally hidden from view outside of the object’s definition. Typically, only the object’s own methods can directly inspect or manipulate its fields
Having no setters prevent status from external modification, but the getters permits to the outside world to view inside the object. If you think that getters are not an antipattern and that you can always use them in your domain classes without problem, probably you are still thinking in procedural code, or at least not fully Object Oriented.
If you permit to the outside world to read one property of an object, you risk that the outside world will use that value to implement logic that should be contained in the object, so you are breaking encapsulation.
Whenever you ask yourself “I need to search in code all part of the software that access this property because I’m going to modify it”, you are implicitly looking for logic that is actually using that property but it is outside the object. The result is that you have piece of Business Logic that
- Read properties of several objects
- implement logic based on these properties.
Then all these objects are bounded together by that external logic and you loose encapsulation and you are not fully OOP. Based on this considerations I took my traffic light sample and decide to try to remove all getters and thanks to DOMAIN EVENTS it was really simple, more than I tought. Here is my new domain.
Even if this is a trivial domain, I appreciate a lot that I ended having a couple of DOMAIN EVENTS composed only by readonly properties (events are in the past and are immutable) and a couple of Domain Classes that contains only methods. Removing all properties actually does not required me to modify any test, and actually I found that each time I found some code that use a getter, that code looked really better after the access to the getter is removed.
If you really can structure your core domain with object that contains only methods, you have a really modular domain where each class is really an autonomous piece of software that does some business in the domain without the need of a direct relation with other entities except communicating with Domain Events.
I know that this is a super simple trivial example (a couple of traffic light is not even close to a real business case), but the result of this stupid and little experiment convinced me that it is the right road.
Gian Maria.
Tags: DDD





