Some time ago when working on some performance critical serialization code for a C# project, I was looking deeply into all flavors of available serialization benchmarks. One thing that struck me at the time was that there weren’t that many of them. The most comprehensive I’ve found was this one: http://www.servicestack.net/benchmarks/NorthwindDatabaseRowsSerialization.100000-times.2010-08-17.html Published by service stack so maybe a little biased, but pretty neat nonetheless.
The test published were pretty in line with what I have observed – protobuf was a clear winner, but if you didn’t want to (couldn’t) use open source, DataContractSerializer was best available to you.
After some more testing, I started finding some discrepancies between this report and my tests and they we’re pretty big – it seemed that DataContractSerializer was better than reported, and the whole order was sometimes shuffled . It all boiled down to two major differences that I’ll explain here
Data Contract Serializer performance
In the quoted tests, DataContractSerializer was almost 7 times slower than protobuf. That’s a lot. In my tests it was 3 times slower. Big difference, huh ? When I ran the tests on my own machine (the thing I love about people at servicestack.net is that they published on GIT the source to their tests so that you could reproduce it at home) I found that DataContractSerializer was 7 times slower. So obviously, there had to be a difference in the way we used it! And indeed there was, when I added my version it was 2,8 times slower than protobuf running on the same test set. Here’s the difference:
Original:
|
1 2 3 4 5 |
using (var ms = new MemoryStream()) using (var xw = XmlWriter.Create(ms)) { var serializer = new DataContractSerializer(from.GetType()); serializer.WriteObject(xw, from); |
My version:
|
1 2 3 4 5 |
using (var ms = new MemoryStream()) using (var xw = XmlDictionaryWriter.CreateBinaryWriter(ms)) { var serializer = new DataContractSerializer(from.GetType()); serializer.WriteObject(xw, from); |
See the difference ?
Sure, it’s binary, you can’t always use it, but in most cases it will do, and it’s way faster. I also noticed, that the payload size is about 1/3 more than JsonDataContractSerializer, which is actually pretty good, so if you’re worried by bandwidth usage, it’s an additional plus. Here are the results of running the Northwind test data with just one additional serialization option (binary one) on my machine:
| Serializer | Larger than best | Serialization | Deserialization | Slower than best |
|---|---|---|---|---|
| MS DataContractSerializer | 5,89x | 10,24x | 5,38x | 7,24x |
| MS DataContractSerializerBinary | 4x | 3,01x | 2,75x | 2,85x |
| MS JsonDataContractSerializer | 2,22x | 8,07x | 9,59x | 9,01x |
| MS BinaryFormatter | 6,44x | 10,35x | 6,23x | 7,81x |
| ProtoBuf.net | 1x | 1x | 1x | 1x |
| NewtonSoft.Json | 2,22x | 2,83x | 3,00x | 2,94x |
| ServiceStack Json | 2,11x | 3,43x | 2,48x | 2,84x |
| ServiceStack TypeSerializer | 1,67x | 2,44x | 2,05x | 2,19x |
Having this difference aside, the results were more consistent with my tests, but still there were big differences in some cases, which leads us to second important point
It’s all about data
There is nothing that will be best for all cases (even protobuf), so you should carefully pick your use case. In my tests, the data was larger, and usually more complex than in Northwind data set used for benchmarking by servicestack.net. Even if you look at their results, the smaller data set the bigger the difference – for RegionDto which has only one int and one string, the DataContractSerializer is 14 times slower (10 times slower on my machine, binary XML for DataContractSerializer is 3,71 slower).
So, does the DataContractSerializer perform better (especially in binary version) if larger objects are involved? It does indeed.
I created a EmployeeComplexDto class that inherits from EmployeeDto and adds some data as follows:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
[ProtoContract(ImplicitFields = ImplicitFields.AllPublic, InferTagFromName = true)] [DataContract] [Serializable] public class EmployeeComplexDto : EmployeeDto { [DataMember] public OrderDto[] OrdersHandled { get; set; } [DataMember] public CustomerDto[] CustomersHandled { get; set; } [DataMember] public List<EmployeeDto> Friends { get; set; } } |
The object being serialized consisted of not more than 10 Orders, 10 Customers and 10 Friends (for each test the same amount and the same data), and here’s what I got serializing and deserializing this 100 000 times:
| Serializer | Payload size | Larger than best | Serialization | Deserialization | Total | Avg per iteration | Slower than best |
|---|---|---|---|---|---|---|---|
| MS DataContractSerializer | 1155 | 1,88x | 6407245 | 6123807 | 12531052 | 125,3105 | 4,11x |
| MS DataContractSerializerBinary | 920 | 1,50x | 2380570 | 3452640 | 5833210 | 58,3321 | 1,91x |
| MS JsonDataContractSerializer | 865 | 1,41x | 7386162 | 14391807 | 21777969 | 217,7797 | 7,14x |
| MS BinaryFormatter | 1449 | 2,36x | 9734509 | 7949369 | 17683878 | 176,8388 | 5,80x |
| ProtoBuf.net | 613 | 1x | 1099788 | 1948251 | 3048039 | 30,4804 | 1x |
| NewtonSoft.Json | 844 | 1,38x | 2844681 | 4272574 | 7117255 | 71,1726 | 2,34x |
| ServiceStack Json | 844 | 1,38x | 4904168 | 5747964 | 10652132 | 106,5213 | 3,49x |
| ServiceStack TypeSerializer | 753 | 1,23x | 3055495 | 4606597 | 7662092 | 76,6209 | 2,51x |
So – are you really ‘stuck’ with DataContractSerializer, or is it quite good ? The answer is of course – it depends. Surely, whenever you can, use the Binary Writer as it is way faster, but even if you do, whether it’s a best choice or not can be answered only with a specific use case in mind – think about what data will you be serializing, how often, etc.
As always with reasoning about performance, best option is to test with your data, in close to real scenarios and (preferably) on similar hardware as the application is going to run in production. The closer you get to this, the more accurate your tests will be.