Yesterday a friend of DotNetMarche asked me this question: “I have the need to store serialized objects into database , I can choose between binary or xml format, which is smaller in size?â€
My first answer was “Binary should occupy less space because it is more compact†but he told me that DBA checked that xml entities actually uses less space than binary ones.
This morning I did some test with a simple class
I tried this code
private static void Test(string testString) { MemoryStream bms = new MemoryStream(); BinaryFormatter bf = new BinaryFormatter(); Test t = new Test() {Property = testString}; bf.Serialize(bms, t); Console.WriteLine("BinaryFormatterSize = {0}", bms.Length); MemoryStream xms = new MemoryStream(); XmlSerializer xs = new XmlSerializer(typeof(Test)); xs.Serialize(xms, t); Console.WriteLine("XmlSerializerSize = {0}", xms.Length); Console.WriteLine(Encoding.UTF8.GetString(bms.ToArray())); Console.WriteLine(Encoding.UTF8.GetString(xms.ToArray())); }
Basically I created an object of type Test and I serialize it with BinaryFormatter and XmlSerializer dumping the size of the serialized data as well as a conversion to string using the UTF8 Unicode encoding, then I invoke this function with Test("abcdefghi this is a longer string to test for a different situation");
The result was.
BinaryFormatterSize = 236 XmlSerializerSize = 229 ? ????? ?? JConsoleApplication4, Version=1.0.0.0, Culture=neutral, Pu blicKeyToken=null?? ?ConsoleApplication4.Test? ?<Property>k__BackingField?? ?? Cabcdefghi this is a longer string to test for a different situation? <?xml version="1.0"?> <Test xmlnssi="http://www.w3.org/2001/XMLSchema-instance" xmlns
sd="http://ww w.w3.org/2001/XMLSchema"> <Property>abcdefghi this is a longer string to test for a different situation< /Property> </Test>
As you can verify the binary formatter is actually longer than xml one, this because you can see that in binary serialization the .net environment serialize the whole name of the class as well as the name of the property saved (k__BackingField because it is an auto property). The xml version is smaller but if you change the XmlSerialization in this way
XmlWriterSettings settings = new XmlWriterSettings(); settings.OmitXmlDeclaration = true; settings.Indent = true; settings.NewLineOnAttributes = true; XmlSerializerNamespaces blank = new XmlSerializerNamespaces(); blank.Add("", ""); using (XmlWriter writer = XmlWriter.Create(xms, settings)) { xs.Serialize(writer, t, blank); }
You are asking for suppression of the XMLDeclaration and no namespace, the result of this test is.
BinaryFormatterSize = 236 XmlSerializerSize = 110 ? ????? ?? JConsoleApplication4, Version=1.0.0.0, Culture=neutral, Pu blicKeyToken=null?? ?ConsoleApplication4.Test? ?<Property>k__BackingField?? ?? Cabcdefghi this is a longer string to test for a different situation? ?<Test> <Property>abcdefghi this is a longer string to test for a different situation< /Property> </Test>
WOW! the xml serialization is less than half in size respect to binary serialization, this because the XMLSerializer does not need to store the type of the object into the serialized format, since you specify the type on XmlSerializer constructor. Moreover the Xml format is human readable and can be validated with XSD or manipulated with XSLT.
Alk.
Tags: Serialization .NET Framework






October 31st, 2008 at 6:00 am
Hi Alk,
I strongly discourage using XmlSerializer to store entities simply because it doesn’t guarantee that the entites will be the same once deserialized: just to give an example,
- entity reference equality is not supported
- circular references are not supported as well
- it doesn’t serialize anything but public members
- properties are serialized as they were fields (what if they contains some additional logic?)
- entities need to have a parameterless constructor
- that constructor is invoked when the entity is deserialized
XmlSerializer has a completely different purpose, that is building xml starting from an object, or populating an object graph starting from an xml representation, but it isn’t meant to be used as a storage system.
m.
October 31st, 2008 at 8:28 am
Yep, you are right, I always use binaryserialization if I need a raw way to store state of the object, but the original question was only about the size of generated serialization stream
so I want to show that it is possible to obtain smaller stream.
I already told roberto that I use binary or soap if I need to use a raw storage system, not the Xml one.
Xml Serialization as you explain is a really different stuff respect to binary or soap serialization, in fact it resides in System.Xml.Serialization while the Binary and soap formatters are in System.Runtime.Serialization giving you the real idea that they are real part of the runtime.
Thanks for the comment.
alk.
March 19th, 2009 at 2:18 am
Dear All,
Hope you all will be fine.
we are using Typed Datasets and Tables in our application. We get datatable from web service (application server) to bussiness layer at client side. data is becoming huge day by day and response from server to client is getting slow.
Now we want compressed data from webservice.
what we did is to get datable in web service, serialize it and compress it using system.io.compression. no issue here in web service and we get the compressed data in business layer at client. The problem / issue occur when we de-serialize after decompress the array of bytes. we got 2 different errors they are
Unable to find assembly , version, culture……
End of Stream encountered before parsing was completed
please help in this regard
Mohid
March 20th, 2009 at 4:31 am
I think that the problem arise because the client will try to use generated proxy class to deserialize the stream, and it does not find the original assembly on disk. What you need is to implement a SOAP extension (http://msdn.microsoft.com/en-u.....64007.aspx) that will compress your message. (maybe someone had already did it)
this will make compression completely transparent process.
alk.
July 13th, 2009 at 6:59 am
Is this example incomplete?
The Test class only contains a string. It is not very compactable by the the binary formatter. Consider this:
float fValue = 46573847.5;
fValue binary size: 32bit
string sValue = “46573847.5″
sValue binary size: 8×10 = 80bit (minimum)
And then you have this:
string sValue = “3″
sValue binary size: 8×1 = 8bit (minimum)
Strings can’t be compacted unless you compress them, so I think comparing these two methods using a single string is a bit wierd.
Am I in outer space?
July 13th, 2009 at 8:11 am
Yes you are right, the problem in finding the best way to serialize data is to understand the type of data you are storing. Basically .net serializers does not compress the serialized output, but you can post compress it with gzip. Moreover XML serialization is really differnt from binary serialization, they are different techniques.
I think that if space is your primary problem, probably it is better to resort to some form of custom serialization with some custom compress algorithm, if space is not so vital, binary serialization is surely a good and simple solution to use, since it is natively implemented by .net.
alk.
August 20th, 2009 at 2:46 am
And i guess this alkampfer’s comparison is good with only small objects if the size of object increases, binary serializer would prove definitely better.