Electively serialize to/from XML or binary in C++

During development, it’s nice to have data in a structured, human-readable format which can be quickly edited with a text editor. For production, it’s often preferable to have a nice and tight format which saves space, is quickly to read and write and hides information from the users curious eyes. The following implementation provides a simple way to electively serialize to/from XML or binary in C++.
Given the fact that we’re using C++ and thus all serializable data will be encapsulated in classes, it’s a good idea to provide a common base class which defines methods to serialize and deserialize an object. We’ll call these two methods readFrom and writeTo since those names pretty much describe what the methods will do. Next, we need something those methods could actually read to and write from. Since it should be transparent where an object serializes itself to or it deserializes from respectively, we’ll define two more pure virtual classes Serializer and Deserializer which simply define the means (i.e. methods) available to (de)serialize a Serializable object, but not the concrete implemenation of a (de)serialization method.
Thus, the Serializable header file looks like that.

#ifndef VTE_CORE_SERIALIZABLE_H
#define VTE_CORE_SERIALIZABLE_H
namespace vte
{
namespace core
{
class Serializer;
class Deserializer;
class Serializable
{
public:
	virtual ~Serializable()
	{
	}
	virtual void readFrom( Deserializer* source ) = 0;
	virtual void writeTo( Serializer* destination ) = 0;
};
} // namespace core
} // namespace vte
#endif

Note that the sources are taken straight from a project of mine, so don’t worry about namespaces or types which are not discussed in this article. If you want to copy and paste the code, just remove them or replace them with your own.

Differences between XML and binary

Next, we need to put some thought into the Serializer and Deserializer classes. The first thing is that each Serializable object will have a number of attributes to (de)serialize which in turn will have a number of different types. Consequently, the classes need to provide means to deal with different types. The second and even more important thing is that we need to keep in mind that we’re going to serialize to two different formats and thus need to take care that their shared interface fulfills everything needed to account the individual requirements.
Imagine you would want to serialize an arbitrary number of address records.
If you would save address information as XML, you would first create a XML version declaration with information about the encoding you chose in order to allow other editors to open and interpred the data correctly. You would then create an enclosing tag, e.g. <addressBook> since XML requires a well-formed structure. Next, you would possibly create a tag for each entry, e.g. address, and each attribute, e.g. name, like so:

<addressBook>
	<address>
		<name>Bill Gates</name>
		<city>Redmond, WA, United States of America</city>
	</address>
	<address>
		<name>Matthias Gall</name>
		<city>Cologne, Germany</city>
	</address>
</addressBook>

Now, imagine the same information as binary. I’ll give a short overview of types and values you’d possibly write.

LONG    The number of entries in the file
BYTE    The length of the first name in bytes
BYTE*   The bytes of the first name
BYTE    The length of the first city in bytes
BYTE*   The bytes of the first city
...

The first difference is that your (proprietary) binary format does not require any information about how data is stored since it’s not meant to be read by others anyway. The second difference is that the number of address entries in the XML format is implicitly given by the number of address children under the enclosing addressBook tag. The third difference is that a logical group of information (i.e. an address in our example) is enclosed by a named tag in XML while there’s no such element in binary. The fourth and last difference is that XML also requires a tag or attribute name for each information it saves while the binary format can simply save the data.

4 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *