Build Custom Code Generators in C#

uite simply, code generation involves taking a small amount of input and generating a large amount of code from it. Often the structure of the output is very complicated and verbose. One key feature of generated code is that it takes only a small variation in input to produce a vastly different set of output. The input to a code generator is an intermediate data format that describes the scenario for which you intend to produce a solution.

When you might use code generation depends on how comfortable you are with generating code. Technically, in every program you write, unless you’re writing in machine language, you’re most likely using some form of it already. The C# compiler, for example, uses your C# source code as a set of cues that tell the system what sort of Microsoft Intermediate Language (MSIL) code to create when building your program.

This article isn’t going to show you how to write a fully functional compiler, but the code generator you’ll build here uses essentially the same process as a compiler.

The first thing to look for in your code to indicate that a generator might be of some use is a set of semi-repetitive steps?steps that follow similar patterns of execution, but don’t qualify as an operation that you could abstract into a single class method. In other words, you look for steps that are similar, but vary enough to require a function for each variation. Another key indication that a code generator might be useful is the level of probability that you’ll have to do the same kind of programming multiple times.

Getting Started
Writing code generators is itself a procedural process. I’ll walk you through the steps first, and then apply the steps to actual code.

Step 1: Find good sample output. The first step in writing a code generator is to figure out what you want the output to look like. I usually take a small example, not necessarily one that represents ALL possibilities of output, but one that’s representative of the manually-written code.

Step 2: Define an intermediate format. This intermediate format will be the input for your code generator. This is arguably the hardest part. You must identify the problem you’re trying to solve, and make sure that your intermediate form is able to accurately represent the full spectrum? all the possible cases of the problem?so that you can instruct your generator what to produce. Even though this is the most difficult step, it’s still very manageable.

  • Start by taking your example output, and breaking it down into sections (in list form). Name the major methods of your class, or classes, and give each a line item in your list.
  • Next, begin adding subsections for more detail. Each subsection represents the major sub-functions of your major sections. For example, your major sections might consist of classes you want to generate, so your subsections can be the various characteristics that classes have, such as fields, properties, constructors, and methods.
  • After laying out all of the sections, you can then add all the different variations for each of the small sections. For example, in the ‘Fields’ section you can vary each field by cardinality (such as single-valued or array-valued) or by type, such as String, Int32, or Double.

When you have completed your list, you will have a set of abstract details already in hierarchical order for you. This is basically the ‘syntax’ for the input to your code generator.

A Real-World Code Generator Example
Keeping the basic process steps in mind, it’s time to pick a real world example. This example walks through the process of writing a code generator to generate Data Transfer Objects (DTOs) which implement IXmlSerializable (i.e. classes that are fully capable of telling the CLR how to serialize and deserialize themselves to and from XML, without allowing it to infer a serialization structure.) If you aren’t familiar with DTO’s and the Domain-Object Model, here’s a quick overview. In programming, the scope of a problem is its domain. You need a format to represent data in each particular domain. In the Domain-Object model, each chunk of data is put into an object that models a real-world concept or thing, reflected by the value of the object’s properties.

In this case, the DTOs are just dumb data-holders. They don’t have any real functionality other than to convey the structure and type of each bit of data that they carry accurately. For example, you can represent a postal address with an Address object. The Address object is an instance of a specially defined Address class that has properties such as “Street,” “City,” “State,” and “Zip.” When a consumer of this model needs address-related data, it gets it in the form of an Address object. Likewise, when a service, or an object in the persistence layer (such as a Business Object, or an ASP.NET Web service) gets data to be written to a database, it will also get an Address object from the client (which might be an ASP.NET Web Form or a Windows Forms application.)

Now suppose you’re writing a three-tier application that has an ASP.NET front end and an ASP.NET Web service as the data-producing and consuming backend. Of course, Addresses will not be the only objects in the entire domain, but I’ll start with Addresses for the sake of illustration.

Because these Address objects must be transferred from the Web service to the clients, they must be serializable to a form suitable for transfer over the wire. XML will serve the purpose nicely. But to force the serialized XML into a custom format, you have to tell the framework exactly how you want the XML to look, which you do by implementing the IXMLSerializable interface.

Implementing IXmlSerializable
IXmlSerializable is a somewhat arcane interface that instructs Microsoft.NET’s built-in XmlSerializer to skip the generation of a default set of serialization methods because you, the architect, have already written code to handle the task. The interface on the surface is relatively simple, it has only three methods:

  • void ReadXml(XmlReader reader)?The framework expects this method to populate an instance of the object from XML text in the reader’s input stream.
  • void WriteXml(XmlWriter writer)?The framework expects this method to write XML text representing the contents of the object to the writer’s output stream.
  • XmlSchema GetSchema()?The framework expects this method to either return an XML Schema, or to return null. It’s also ok to throw a NotImplementedException.

In addition, the .NET Framework 2.0 adds a concept called a SchemaProvider to the mix, letting the internal system extract the schema in a prescribed manner so that it will not have to infer how the to form and validate the input and output. This article doesn’t cover the “what and why” of IXmlSerializable itself, so if you want to get a quick and enlightening run-down on how it works, check out Thiru Thangarathinam’s article “Customizing XML Serialization in .NET 2.0 Using the IXmlSerializable Interface.”

The sample generator you’ll see in this article uses custom XML serialization to control the XML that the Web service uses to send DTOs to clients. In fact, that’s largely the point of the generator?the custom XML serialization code is different for each DTO class, so it’s far easier to generate it than to write it manually.

Planning a Generator
I said that I like to start with the output first, so prepare yourself a bit here. I’m not trying intimidate you, but the long and intricate output in Listing 1 illustrates the reasons why you might want to use code generation to begin with. If you take a close look at the code in Listing 1, you’ll see that it’s not that complex, but writing such code repeatedly takes a lot of time, and you certainly wouldn’t want to do it very many times by hand:

With the sample output in hand, you can start listing items, as described earlier. The main sections of the output fall into these four major operations:

  1. Properties and Fields
  2. The Schema stuff
  3. ReadXml
  4. WriteXml

These four major operations are, you’ll find, pretty much the same in every case when generating a DTO that implements IXmlSerialzable. That’s good news, because the list describes not only the Address object, but any object that this generator will be able to produce. You want to make sure that the list remains generalizable to all output cases as you develop it. In this case, if the outline becomes Address-specific the generator won’t be able to produce anything but addresses.

Still looking at Listing 1 and starting from the top of the file, here are some more recognizable (yet still neutral across the domain) details:

  • File Header Info
  • Opening Comment
  • Using Statements
  • Class Declaration
  • XmlRoot attribute
  • SchemaProviderAttribute
  • Class Body
  • Properties and Fields
  • Helper Methods (ToXml, CreateFromXml)
  • GetProviderSchema (SchemaProvider attribute wires to this)
  • ReadXml
  • Read each property
  • Read atoms
  • Read lists of atoms (arrays)
  • WriteXml
  • Write each property (NOT the root element!)
  • Write atoms
  • Write lists of atoms (arrays)
  • Close the Class ( “}”)
  • Close the File header info (Namespace) ( “}”)

Now that looks fairly organized, and it completely covers the list of things in the Address example specifically, but you also want to be able to support properties that are entire objects, not just single values or lists of values. In fact, because objects often contain collections of other objects, you must allow for that possibility as well. I like to call sub-objects comcompositesposites, because that’s what they are essentially, a composition of either sub-objects, or sub-atoms and lists or arrays of either, so here’s the final abstract, with new items in bold text:

  • File Header Info
  • Opening Comment
  • Using Statements
  • Class Declaration
  • XmlRoot attribute
  • SchemaProviderAttribute
  • Class Body
  • Properties and Fields
  • Helper Methods (ToXml, CreateFromXml)
  • GetProviderSchema (SchemaProvider attribute wires to this)
  • ReadXml
  • Read each property
  • Read atoms
  • Read composites (sub-objects)
  • Read lists of atoms (arrays)
  • Read lists of composites (arrays of sub-objects)
  • WriteXml
  • Write each property (NOT the root element!)
  • Write atoms
  • Write composites
  • Write lists of atoms
  • Write lists of composites
  • Close the Class ( “}”)
  • Close the File header info (Namespace) ( “}”)

Now it’s time to create the intermediate input form. When building code generators, you should plan spend the bulk of your time and consideration on designing the intermediate form domain. It’s useful to be able to easily store and retrieve this intermediate form to and from disk. .NET makes it easy to store and retrieve XML from disk, so XML makes a good intermediate format. The intermediate structures will be read from and written to XML files.

It’s important that the intermediate form be capable of encapsulating the entire domain’s set of variations, so consider the general case. You want to generate DTO objects. Basically these objects consist of a set of properties. Each property can vary by category (atom or composite), cardinality (single, or list) and by type. For demo purposes, I’ll constrain atoms to only a few types: String, Boolean, Int16, Int32, Int64, DateTime, Float, and Double. Taken together, category and cardinality have only a few possibilities, so we can fuse those together into one variation point. Here’s a set of enumerations and small classes to hold intermediate data:

   enum DataType   {      String,      Boolean,      Int16,      Int32,      Int64      DateTime,      Float,      Double   }      enum PropertyType   {      Atom,      Composite,      ListAtoms,      ListComposites   }      class ObjectDefinition   {      public String TypeName;      public String XmlNameSpace;       public List Property;      ...   }      class PropertyDefinition   {      public String DeclaringTypeName;      public String CompositeTypeName;      public String PropertyName;      public PropertyType PropType;      public DataType AtomicType;      ...   }
Note: The preceding enumerations and classes are trimmed-down versions of the actual code for these. I actually generated the ObjectDefinition, PropertyDefinition, DataType, and PropertyType from an XML Schema Definition file (XSD), which is included in the download, but beyond the scope of the article. The relevant point is that the types are easily serializable to and from XML using the default serializer (it’s OK to use the default serializer for simple things).

Because these object definitions are serializable to XML, you can define them using XML files, and let the generator read the definitions from the files. Listing 2 shows the XML for the intermediate form of an Address object:

Listing 2 may look complicated, but it’s actually just peanuts compared to what we’re about to get. In the XML above, Address is a class with an integer (AtomicType: Int32) property, ID. It has several string properties: Street, City, and State, and an integer (Int32) Zip property. All these properties are singular values, so the intermediate form specifies their PropType attribute as Atom. Just to be tricky, I stuck a nonsense property in the code called ArrayDemo, which has nothing to do with an Address, but is there to illustrate the concept of handling arrays.

Now we’re getting somewhere! You have a generic set of classes and a serializable intermediate format that fully describes an Address object. The only thing left to do is create a way to transform that XML into fully functional C# output similar to the sample file.

Example Input and Output
The code below contains several samples of the input XML and the resuting generated code. Notice how the frame around the reading of each of these “property reads” is the same, but the part in the middle varies based on both the type and the cardinality:

Input XML:

         Address      State      Atom      String   

Generated Code:

      if (reader.LocalName == "State")   {       // generated code for 'State' by GeneratePropertyFromXml()       readString = reader.ReadString();       if (!string.IsNullOrEmpty(readString))           this.State = readString;          continue;   }   

Input XML:

         Address      Zip      Atom      Int32   

Generated Code:

   if (reader.LocalName == "Zip")   {       // generated code for 'Zip' by GeneratePropertyFromXml()       readString = reader.ReadString();       if (!string.IsNullOrEmpty(readString))           this.Zip = XmlConvert.ToInt32(readString);          continue;   }   

Input XML:

         Address      ArrayDemo      ListAtoms      Int32   

Generated Code:

   if (reader.LocalName == "ArrayDemo")   {       // generated code for 'ArrayDemo' by GeneratePropertyFromXml()       readString = reader.ReadString();       if (!string.IsNullOrEmpty(readString))          this.ArrayDemo.Add(XmlConvert.ToInt32(readString));          continue;   }

For some operations, it doesn’t make sense to write entire methods within a function, because each variation will differ only slightly (such as by System.Type or by property name.) Therefore, for these it’s best to write short reusable helper methods that return small strings that the calling method can insert into the output. For example, this GetTypeName() function is a helper method that takes a PropertyDefinition as input and returns the .NET type name for the property it represents.

Author’s Note: I purposely named the DataType enumerations to match their .NET counterparts. This makes outputting the name a simple conversion to string operation, as you can see in the code pDef.AtomicType.ToString().

   private static string GetTypeName(PropertyDefinition pDef)   {      if (pDef.PropType == PropertyType.Composite ||          pDef.PropType == PropertyType.ListComposites)      {         return pDef.CompositeTypeName;      }      return pDef.AtomicType.ToString();   }

The code near the bottom of the hierarchy of methods generates the actual nitty-gritty output. Because the higher level (more generic) methods pass the IndentingWriter down through each step, all the output will go into the same place in the right order.

   static void GenerateReadAtmoicListValue(PropertyDefinition pDef,       IndentingWriter tw)   {      tw.WriteLine("readString = reader.ReadString();");      tw.WriteLine("if(!string.IsNullOrEmpty(readString))");      tw.Indent();      tw.WriteLine("this.{0}.Add({1});", pDef.PropertyName,          ConvertAtomFromXml(pDef));      tw.OutDent();   }

The helper method ConvertAtomFromXml in the preceding method is another helper method because, again, this conversion will be very similar for all properties.

The downloadable source includes three test projects: one for testing the generator (TestGenerator.csproj) and another for testing the serialization of the generated classes (TestSerialization.csproj), as well as the example Web service.

Author’s Note: In the projects, the two “…ODef.xml” files, AddressODef.xml and PersonODef.xml are marked to be copied to the output directory, so the project code can refer to them without any extra path info. However, the generated output files are coded to go into the Serialization test directory, so if you don’t preserve the directory structure of the demo projects, it won’t work. It’s not apocalyptic, but just be aware that you’ll have to modify those directory values to get the demo working if you re-arrange things.

Completing and Testing the Generator
Now that the generator is complete, you can generate and include the finished classes in your code just like any other classes. Here’s how to invoke the generator:

   namespace TestApplication   {       class Program       {           static void Main(string[] args)           {               ObjectDefinition addressDef =                   ObjectDefinition.LoadFromFile("AddressODef.xml");               ObjectDefinition personDef =                   ObjectDefinition.LoadFromFile("PersonODef.xml");                  ClassGenerator AddressGenerator = new ClassGenerator(                  "Address", "Dolan.TestCode");               AddressGenerator.SaveCode(                  @"......TestSerializationAddress.cs");                  ClassGenerator PersonGenerator = new ClassGenerator(                  "Person", "Dolan.TestCode");               PersonGenerator.SaveCode(                  @"......TestSerializationPerson.cs");                  Console.Out.Write("done...");               ...

To expose the generated classes in a Web service, you need only return the class as you would have before implementing IXmlSerializable. One of the special things about this interface is that the XmlSerializer looks for it before deciding how to serialize your class. If you specify that your class implements IXmlSerializable, the serializer will call your ReadXml and WriteXml methods, which?depending on how much data you’re sending over the wire?can be a lot faster than using any old default serializable class.

The sample Web service project (TestService) functions as an example of how to return objects. Notice that because the class implements IXmlSerializable, the XmlSerializer invoked by the ASMX architecture will automatically call the generated methods rather than initiating a reflection-based generation using the intrinsic serialization code.

A side note here: The Schema Provider as it stands in the .NET Framework version 2.0 is a little limited. It doesn’t know how to handle objects that are included as part of a complex hierarchy and that are also defined to the schema provider by their own schema provider methods. Therefore, it will throw an exception because it percieves them to be declared twice (for example Person uses a Schema Include for Address, and Address defines itself via the Schema provider). The workaround here is to use the same namespace for all of your classes. That way, the schema provider will essentially allow you to “Overwrite” any previous definition with the next one. This has the desired effect, but unfortunately doesn’t follow the spirit of what a schema provider should logically be doing.

Here’s a recap of the entire process for writing a code generator:

  • Determine that your code will be doing a finite set of repeatable steps a potentially infinite number of times (really this can be applied to any code, but practically speaking, you have to write these steps individually).
  • Write a single instance of the output you wish to generate. This is not necessarily easy, and can be downright grueling because it’s precisely what you don’t want to do?in other words, avoiding such work is why you decided to write a generator in the first place.
  • Map out the major structures (classes, sections of code, and methods) in outline form, and assign an intermediate representation to each instance of variation.
  • Create an intermediate format to hold the abstractions you wish to represent. This is a little vague, I know, but the applications can vary from generating data-holding classes to generating entire services or application tiers, so you have to make the intermediate representation fit the model you wish to implement. (This is what ObjectDefinition, PropertyDefinition, DataType, and PropertyType are in my example.) It’s a good idea to make this intermediate format able to be loaded and stored easily to disk. That will allow you to prepare your target structures without writing code to declare them.
  • Implement the descending generator which generates output based on the values of the intermediate representation. In this example, it’s the ClassGenerator class.
  • Test your code. Though my examples work, they aren’t quite complete. Before you can consider your generator to be “domain-complete”?covering all aspects of the target domain?you must have at least one test object that covers each variation that you are encapsulating.

Code generation can be intimidating at first. After all, it involves a complex set of steps to follow, and can at times the output can be frustrating to debug. Just take it one step at a time, and keep in mind all the benefits that not writing repetitive code offers in the end. If you break it down step-by-step and keep your cool, you’ll eventually be able to pull code generation out of your toolbox of techniques at will. You may even find generating code produces cleaner and more consistent results than some of your own long-winded efforts. A whole new world of possibilities opens up for you now that what you once considered to be ‘too much typing’ or ‘too much code to tackle’ is well within your reach.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

Recent Articles: