Thursday, July 17, 2008

Serialization != XmlSerializer

I have seen some interesting code in recent months around serialization. It appears that an old habit that any serialization would be done with the XmlSerializer, has turned into serialization has to be done with the XmlSerializer.

Now while XmlSerializer is great, it has some pitfalls. Notably for this post, it can’t serialize object graphs with recursive references. To illustrate better consider an Order object with a collection of OrderLine objects. Each of the OrderLine object has a property that referenced the Order it belonged to. Now in .Net this is no problem and can be kind-of handy to have Orders know about their OrderLines and then be able to get back to the Order from the OrderLine. However this is hard to describe in XML (at least in a standardized way) as XML is inherently hierarchical.

public class Order
    private Collection<OrderLine> _lines = new Collection<OrderLine>();
    public int OrderNumber { get; set; }

    public Collection<OrderLine> Lines
        get { return _lines; }

public class OrderLine
    public string ItemCode { get; set; }
    public int Quantity { get; set; }
    public decimal UnitPrice { get; set; }
    public string Description { get; set; }
    public Order Parent { get; set; }

    public OrderLine()

    public OrderLine(string itemCode, decimal unitPrice, int quantity, string description, Order parent)
        this.ItemCode = itemCode;
        this.UnitPrice = unitPrice;
        this.Quantity = quantity;
        this.Description = description;
        this.Parent = parent;

I also thought it would be interesting to consider the impact of best practices regarding properties that are collection types.  It is considered to be best practice in .NET code to only provide a getter on properties that are collections.  [see Framework Design Guidelines 8.3.2]

This however throws a wobbly when used as a DTO with WCF. Now my assumption here is that to be interoperable with other systems (thinking Java), a getter and a setter would need to be provided to allow serialization and then de-serialization over the wire.

From my observations, this constraint has somehow been perceived to apply to all serialization. I have seen setters being applied to objects where there is no need but for the misinformed notion that it is required for serialization. The thing that really surprises me is that it is so easy to verify your assumptions regarding these problems. So, lets play!

1st up:  Serializing Hierarchies (well really collections)

If we take a simplified version of the code above where the Parent property is removed so we just have an object with a collection property. This serializes fine. I also do not need to provide a setter for my collection property.

Order myOrder = new Order();
myOrder.OrderNumber = 1;
myOrder.Lines.Add(new OrderLine("ABC", 10, 1, "First Item"));
myOrder.Lines.Add(new OrderLine("ABCD", 4, 1, "Some content"));
myOrder.Lines.Add(new OrderLine("ABD", 10, 4, "This Item"));
myOrder.Lines.Add(new OrderLine("ACD", 166, 1, "Test"));
myOrder.Lines.Add(new OrderLine("BCDE", 24, 9, "Last Item"));

System.Xml.Serialization.XmlSerializer oXS = new System.Xml.Serialization.XmlSerializer(typeof(Order));
System.IO.StreamWriter ostrW = new System.IO.StreamWriter(@"C:\SavedOrder.xml");
oXS.Serialize(ostrW, myOrder);

2nd Test: recursive graphs (adding the Parent property back in)

If we add the parent property back in and set it correctly we will get an InvalidOperationException when we try to serialize it. Now to me, this is not a time to throw our hands in the air and start coming up with wild work-arounds for how to serialize our object. My first question would be “Does this result of the serialization have to be human readable?” if the answer is no then Binary Serialization works a treat.

This little helper method will serialize any old thing to disk for you.

public static void SerializeToBinaryFile(Object entity, string path)
    Stream stream = new FileStream(path, System.IO.FileMode.Create);
    IFormatter formatter = new BinaryFormatter();
    formatter.Serialize(stream, entity);

Now its not exactly production quality code but it proves a point.

If the answer to the above question of readability was yes, then we may have to ask some other questions of our requirements. It is very likely that we are leaking our domain model into another layer. If serialization is required to saved to disk then the disk can be considered a repository and may require a less specific model. The same can be said it serialization is for communication between systems. Both these scenarios would have no requirement to have the recursive nature of our model described. Our implementation with the recursion can be implied from the raw data.

For a working example of this code download this C# file.


Unknown said...

curisous to know how you would approach the scenario if XML was required for the DTO's?
Would you just drop the parent reference because the parent-child relation is implied due to XML's hierachial structure? Maybe only include the id of the parent instead of an object ref?

Lee Campbell said...

Sorry about the massively delayed reply. That is exactly what I would do. We are just persiting to disk or sending over the wire. So why expose our implementation details. And that is what "Parent" is in this case, it adds no value to the raw data, but provides a handy property for the developer of the hydrated objects. Really it should be attributed as [NonSerialized()]. Next question is, why dont we use a DTO for this anyway?