Custom XML Serialization

Another advantage of storing properties of entities into a state object based on a Dictionary, is the ability to easily serialize objects in custom formats. As an example I create an XML serializer that is capable to serialize an entity in a custom XML format.

I used this simple serializer to create a NHibernate User Type that permits me to save a child entity in a single XML column of SQL Server, a feature useful when you need to save objects which schema changes quite often and you do not want to keep database schema updated, or you need to store dynamic data into the DB. I now that all of you are screaming “USE NO SQL DB”, like Raven, but it is not simple to introduce new technologies into existing projects, and only to justify the need to save the 2% of objects.

Thanks to the custom serializer and the ability to do DeepCloning, writing such a User Type is really simple. First of all, I’ve implemented a method called EquivalentTo that permits me to compare two entities based on their state, this makes trivial writing the Equals Method of the UserType

bool IUserType.Equals(object x, object y)

{

    if (ReferenceEquals(x, y)) return true;

    if (x == null || y == null) return false;

 

    if (x is BaseEntity)

        return ((BaseEntity)x).IsEquivalentTo(y as BaseEntity);

 

    return x.Equals(y);

}

Same consideration for the IUserType.DeepCopy method, based on the Clone method of the base class. Saving and loading the object is just a matter of using the BaseEntity serialization methods.

public object NullSafeGet(System.Data.IDataReader rs, string[] names, object owner)

{

    Int32 index = rs.GetOrdinal(names[0]);

    if (rs.IsDBNull(index))

    {

        return null;

    }

    String databaseContent = (String)rs[index];

    return Deserialize(databaseContent);

 

}

 

internal Object Deserialize(String content)

{

    XElement element =  XElement.Parse(content);

    return element.DeserializeToBaseEntity();

}

 

public void NullSafeSet(System.Data.IDbCommand cmd, object value, int index)

{

    if (value == null || value == DBNull.Value)

    {

        NHibernateUtil.String.NullSafeSet(cmd, null, index);

    }

    else

    {

        NHibernateUtil.String.Set(cmd, Serialize(value), index);

    }

 

}

 

internal String Serialize(Object obj)

{

    if (!(obj is BaseEntity))

        throw new ArgumentException("Only BaseEntity based entities could be serialized with this usertype", "obj");

    return (obj as BaseEntity).SerializeToXml().ToString();

}

The advantage of this approach, is that I have another base entity class called BaseExpandoEntity that permits to store into state object property by name, it is like a dynamic object, but I used it in .NET 3.5 where dynamics still does not exists. This kind of entity is clearly not really OOP, it is not well encapsulated, because you can set property of any name from external code and it is used mainly as a DataClass, just to store information in database without the need to be schema or class bounded. Now suppose to have a class called Father that has a property of type SpecificNotes, based on this BaseExpandoEntity and saved in database with the above User Type, you can write this code.

Father d1 = new Father () {...};

d1.SpecificNotes.SetProperty("Ciao", "mondo");

d1.SpecificNotes.SetProperty("Age", 80);

Repository.Father.Save(d1);

In the above code, the Father class has a property called SpecificNotes mapped as XML in database, I can store two new properties with the SetProperty and this is what is saved to database.

INSERT INTO Father

            (Id,

             xxx,

             yyy,

             SpecificNotes)

VALUES      (1,

             ...,

             ...,

             <SpecificNotes fname="Myproject.Entities.SpecificNotes, Myproject.Entities" ><Ciao>mondo</Ciao>

  <Age type="System.Int32">80</Age></SpecificNotes>' )

Now suppose you want to retrieve from the database all Father objects that have a SpecificNote with a  property named Ciao with value ‘mondo’, how could you issue the query since the object is stored as XML? The solution is creating a custom Criteria Operator. I do not want to bother you with the details, but the key method is something like this

public override NHibernate.SqlCommand.SqlString ToSqlString(

    NHibernate.ICriteria criteria, 

    ICriteriaQuery criteriaQuery, 

    IDictionary<string, NHibernate.IFilter> enabledFilters)

{

    string objectPropertyColumnName = criteriaQuery

        .GetColumnsUsingProjection(criteria, PropertyName)[0];

 

    StringBuilder criteriaText = new StringBuilder();

    criteriaText.Append(objectPropertyColumnName);

    criteriaText.Append(".value('(/*/");

    criteriaText.Append(XmlCriterionParameters.ChildPropertyName);

    criteriaText.Append(")[1]', 'nvarchar(max)')");

    switch (XmlCriterionParameters.CriteriaOperator)

    {

        case CriteriaOperator.Equal:

            criteriaText.Append("=");

            break;

        case CriteriaOperator.Like:

            criteriaText.Append(" like ");

            break;

        default :

            throw new NotSupportedException("Still not supported operator");

    }

 

    criteriaText.AppendFormat("'{0}'", Value);

    return new SqlString(criteriaText.ToString());

 

}

In this example I’ve implemented only “=” and “like” operators, but it is enough for this sample. Now I can issue the following query.

Query query = Query.CreateXml("SpecificNotes", "Ciao", CriteriaOperator.Equal, "mondo");

var result = Repository.Father.GetByCriteria(query);

This snippet used a Query Model to specify the query, but as you can see the important aspect is that I’m able to create an XML criterion on the “SpecificNotes” sub object, where the expando property “Ciao” is Equal to the value “mondo”. Here is the query that was issued to the DB

SELECT ...

FROM   Father this_

WHERE  this_.SpecificNotes.value('(/*/Ciao)[1]', 'nvarchar(max)') = 'mondo'

This is quite a primitive query, because of the /*/ that basically mean I’m searching for a property Ciao contained in any subobject, but you can modify your Custom Criterion to adapt the XPath to your need; the important aspect is that I’m now able to store and retrieve transparently my expando objects from a standard SQL server thanks to the great flexibility of NHibernate.

Gian Maria.

Trim all non-alpha character from SQL string

I’have a table with some dirty data, if you select it seems that everything is ok, but since it was imported from an external source, it happens that some string field actually ends with strange non alpha char.

The symptom is that I have some strange behavior on some data, then I verify what is the content of that row, so I issue a Select * from xxx where Name = ‘Azioni’ and got no result, so I select everything that contains Azioni and I found the record. This is the clear symptom that the field contains some strange stuff in it, so I simply to a Copy and paste in the editor to see exactly what is stored in the field, and I found

image

Do you see the strange char at the end? So I need a trim function that removes every non alphabetic character from the beginning and the end of the string. Looking in the internet I found this interesting article that lists a function that remove all non alpha char from the string, so I modified to make it remove non-alpha char only from the beginning and the end.

Here it is if you need it:

create Function [dbo].[TrimNonAlphaCharacters](@Temp nVarChar(max))

Returns nVarChar(max)

AS

Begin

 

    While PatIndex('[^a-z]%', @Temp) > 0

        Set @Temp = Stuff(@Temp, PatIndex('[^a-z]%', @Temp), 1, '')

 

    While PatIndex('%[^a-z]', @Temp) > 0

        Set @Temp = Stuff(@Temp, PatIndex('%[^a-z]', @Temp), 1, '')

        

    Return @TEmp

End

I’ve not tested it extensively, just ran it against my data and verified that it solves my problem, so I’m pretty confident that it works, but feel free to post any correction or any better implementation if you have one :)

Gian Maria.

Always pay attention to uniqueness to SQL Index

I have a database with several Gigabyte of data and query performance is usually a issue, so we need to take great care of indexes and DB optimization. Since the vast majority of data access is done with NHibernate, we have also some read-only view that we use to easy the access from the views.

One of this view, have four left outer join from a main table to other four tables and we have a SELECT COUNT query that is quite slow, so we decide to understand how to optimize it. I started looking at the execution plan and I found this.

image

Figure 1: Original execution plan.

The execution plan clearly shows that we have a join between two tables, but I’m issuing a SELECT COUNT with condition on fields of the first table only, the question is: why sql server issue a join with another table, if the other table does not contains fields that are involved on the query?

The answer is really simple, a JOIN between two tables influences the number of rows returned, suppose that for each record of the main table we have 2 records on the table in join, this situation affects the SELECT COUNT operation, so SQL SERVER has no choice, he need to do the JOIN. Our Data Model enforces in Business Logic that for each record in the main table, at most we can have a single related record on the other table, so I decided to make the index that supports the foreign key unique (it was not unique because we usually enforce this type of constraint in business logic).

Once the index is Unique, Sql Server knows that the join could not affect the result of SELECT COUNT and from execution plan I verified that the join is gone and performances are greatly improved.

The motto is: if you have Business Logic that enforce uniqueness, creating a unique index at the Database Level if you are using that field in join can lead to better execution plans.

Gian Maria.

Run SSIS package from another server in workspace

Today I needed to move some SSIS packages that actually runs on the same server where the database resides, to another server dedicated to run SSIS packages. I do not have a domain and the second server is simply in the same network as the first server. I simply reconfigured a Job to run the Package from File system (as originally configured in the original server), but now I could not use Integrated Security because I’m running the SSIS package from another PC. So I changed the connection string to include user and pwd (SQL authentication), but when I saved the job step the password simply disappeared from the connection string.

This happens for security, every person that has right to look at job properties can see the password to connect to the other database server, thus password are not saved to job properties. An answer can be found here and I decided to save the package to database server to store everything in the server.

First of all I connected to the analysis server where I want the SSIS to run and add a specific folder for my packages. You can see this in Figure 1, and the new folder is blured because it contains name of the product of my customer.

03-08-2011 12-46-17

Figure 1: Adding a folder to contain all my packages.

Now I open the package in business development studio and right click on an empty part of the designer, then choose properties (Figure 2).

Untitled

Figure 2: Changing the protection level to Server Storage

I changed the protection level to Server Storage then I go in menu File –> Save copy of xxxxx.dtsx as to save a copy of this package to the SSIS Server. You should now see your package inside the new folder I created in Figure 1.

Untitled2

Figure 3: I saved a copy of the dtsx package inside the SSIS server

Now I saved the package to the server, then try to execute again with no luck :(, when I insert userid=xxx;pwd=yyyy inside the configuration of the job step, the password is not stored inside the configuration, due to security reason and the package execution still fails to execute. The solution was to Right click on the designer of the package and choose to

Then I decided to store all configuration inside a table of the server.

1

Figure 4: Saving configuration inside a table in a database.

I choose to save configuration inside a database table because it is much simplier to configure, everything is inside SQL server, no need of external XML or text file, or registry keys, or environment variable etc. After you choose the table used to store settings, you need to decide what to export into this configuration, I simply choose to move the two connection strings.

2

Figure 5: Save all connection string properties inside the table.

You can choose to save only some of the properties of the connection string, but saving everything is not a problem, now I can open the table and modify the connections as needed.

3

Figure 5: All the information about connection strings are stored inside a table.

Now I can schedule execution of SSIS package with SQL Server Agent with no problem. I must admit that I’m not SSIS Guru, but deploying SSIS package is not the most easy task to accomplish, especially if you do not use domain.

Alk.

Tags:

Sql Server, Schema and scope

This morning I spent 20 minutes completely puzzled on a stored procedure in Sql Server. This stored procedure is not too complex, it moves data incrementally on a denormalized table to speed up some searches. The concept is simple, I run a series of queries to obtain a list of the ids of modified entity since the last run of the stored, then I update those lines and insert the new ones.

I ran all the queries inside SSMS, verified that everything is good and then copy/paste T-SQL code inside a stored procedure. Now I reset the last run date, execute the stored and… surprise …. no row was loaded in the table. I was puzzled, so I open stored procedure’s code, selected the whole T-SQL content (except the ALTER PROCEDURE), pressed F5 and magically the table gets populated, then I execute the stored from another query windows and still no row was inserted…. how it is possible that a query behave differently when you execute the T-SQL code and when the very same T-SQL code is executed inside a stored procedure?

The problem is subtle and originated from the design of the database: I have a view called CampaignView, that is really slow (it has many join and lot of subqueries) so I decided to create a table called datawarehouse.CampaignView to store the materialized content of the view. The code that populates the table at a certain point issue a query like this one

   1: INSERT INTO [datawarehouse].[CampaignCategoryView]

   2: (xxx, yyy, zzz)

   3: SELECT FROM CampaignCategoryView 

   4: WHERE ........

Basically I simply insert into the datawarehouse table selecting rows from the original slow view with a WHERE condition that identify all the new rows to be inserted.

When you execute the code it works perfectly, because the SELECT clause selects from the dbo.CampaignCategoryView view. When the very same code gets executed inside a stored called [datawarehouse].[CampaignCategoryViewIncrementalPopulation] the default schema is datawarehouse because it is the schema where the stored lives in. This completely changes how the previous SQL snippet gets executed, because the SELECT FROM part is now selecting from [datawarehouse].[CampaignCategoryView]. This means that nothing get inserted in the table because it is inserting data from itself. The solution was to specify the schema for the source table

   1: INSERT INTO [datawarehouse].[CampaignCategoryView]

   2: (xxx, yyy, zzz)

   3: SELECT FROM dbo.CampaignCategoryView 

   4: WHERE ........

This problem taught me two different lessons: the first one is that in SQL is always a good idea to specify SCHEMA of objects and the second one is, not to create two object in a database with the very same name but in different SCHEMA, because if you forget the first lesson, the code executes differently depending on context in witch it runs.

alk.

Tags: