uite simply, code generation involves taking a small amount of input and generating a large amount of code from it. Often the structure of the output is very complicated and verbose. One key feature of generated code is that it takes only a small variation in input to produce a vastly different set of output. The input to a code generator is an intermediate data format that describes the scenario for which you intend to produce a solution.
When you might use code generation depends on how comfortable you are with generating code. Technically, in every program you write, unless you're writing in machine language, you're most likely using some form of it already. The C# compiler, for example, uses your C# source code as a set of cues that tell the system what sort of Microsoft Intermediate Language (MSIL) code to create when building your program.
This article isn't going to show you how to write a fully functional compiler, but the code generator you'll build here uses essentially the same process as a compiler.
The first thing to look for in your code to indicate that a generator might be of some use is a set of semi-repetitive stepssteps that follow similar patterns of execution, but don't qualify as an operation that you could abstract into a single class method. In other words, you look for steps that are similar, but vary enough to require a function for each variation. Another key indication that a code generator might be useful is the level of probability that you'll have to do the same kind of programming multiple times.
Writing code generators is itself a procedural process. I'll walk you through the steps first, and then apply the steps to actual code.
Step 1: Find good sample output. The first step in writing a code generator is to figure out what you want the output to look like. I usually take a small example, not necessarily one that represents ALL possibilities of output, but one that's representative of the manually-written code.
Step 2: Define an intermediate format. This intermediate format will be the input for your code generator. This is arguably the hardest part. You must identify the problem you're trying to solve, and make sure that your intermediate form is able to accurately represent the full spectrum all the possible cases of the problemso that you can instruct your generator what to produce. Even though this is the most difficult step, it's still very manageable.
- Start by taking your example output, and breaking it down into sections (in list form). Name the major methods of your class, or classes, and give each a line item in your list.
- Next, begin adding subsections for more detail. Each subsection represents the major sub-functions of your major sections. For example, your major sections might consist of classes you want to generate, so your subsections can be the various characteristics that classes have, such as fields, properties, constructors, and methods.
- After laying out all of the sections, you can then add all the different variations for each of the small sections. For example, in the 'Fields' section you can vary each field by cardinality (such as single-valued or array-valued) or by type, such as String, Int32, or Double.
When you have completed your list, you will have a set of abstract details already in hierarchical order for you. This is basically the 'syntax' for the input to your code generator.
A Real-World Code Generator Example
Keeping the basic process steps in mind, it's time to pick a real world example. This example walks through the process of writing a code generator to generate Data Transfer Objects (DTOs) which implement IXmlSerializable (i.e. classes that are fully capable of telling the CLR how to serialize and deserialize themselves to and from XML, without allowing it to infer a serialization structure.) If you aren't familiar with DTO's and the Domain-Object Model, here's a quick overview. In programming, the scope of a problem is its domain. You need a format to represent data in each particular domain. In the Domain-Object model, each chunk of data is put into an object that models a real-world concept or thing, reflected by the value of the object's properties.
In this case, the DTOs are just dumb data-holders. They don't have any real functionality other than to convey the structure and type of each bit of data that they carry accurately. For example, you can represent a postal address with an Address object. The Address object is an instance of a specially defined Address class that has properties such as "Street," "City," "State," and "Zip." When a consumer of this model needs address-related data, it gets it in the form of an Address object. Likewise, when a service, or an object in the persistence layer (such as a Business Object, or an ASP.NET Web service) gets data to be written to a database, it will also get an Address object from the client (which might be an ASP.NET Web Form or a Windows Forms application.)
Now suppose you're writing a three-tier application that has an ASP.NET front end and an ASP.NET Web service as the data-producing and consuming backend. Of course, Addresses will not be the only objects in the entire domain, but I'll start with Addresses for the sake of illustration.
Because these Address objects must be transferred from the Web service to the clients, they must be serializable to a form suitable for transfer over the wire. XML will serve the purpose nicely. But to force the serialized XML into a custom format, you have to tell the framework exactly how you want the XML to look, which you do by implementing the IXMLSerializable interface.
IXmlSerializable is a somewhat arcane interface that instructs Microsoft.NET's built-in XmlSerializer to skip the generation of a default set of serialization methods because you, the architect, have already written code to handle the task. The interface on the surface is relatively simple, it has only three methods:
- void ReadXml(XmlReader reader)The framework expects this method to populate an instance of the object from XML text in the reader's input stream.
- void WriteXml(XmlWriter writer)The framework expects this method to write XML text representing the contents of the object to the writer's output stream.
- XmlSchema GetSchema()The framework expects this method to either return an XML Schema, or to return null. It's also ok to throw a NotImplementedException.
In addition, the .NET Framework 2.0 adds a concept called a SchemaProvider to the mix, letting the internal system extract the schema in a prescribed manner so that it will not have to infer how the to form and validate the input and output. This article doesn't cover the "what and why" of IXmlSerializable itself, so if you want to get a quick and enlightening run-down on how it works, check out Thiru Thangarathinam's article "Customizing XML Serialization in .NET 2.0 Using the IXmlSerializable Interface
The sample generator you'll see in this article uses custom XML serialization to control the XML that the Web service uses to send DTOs to clients. In fact, that's largely the point of the generatorthe custom XML serialization code is different for each DTO class, so it's far easier to generate it than to write it manually.