Defining a Generator
Building a generator is about defining how to map model concepts to code or other output. In the simplest case, each modeling symbol produces certain fixed code that includes the values entered into the symbol as arguments by the modeler. To go a bit further a generator can also take into account relationships with other model elements or other model information, such as sub-models, models made with other languages, or pre-existing library code. The way the generator takes information from models and translates that into code depends on what the generated output should look like. The examples of model serialization you've seen here that navigate model connections, generate function calls from flow models and generate state machines using switch cases are just a few examples of what you can do.
The generator definition depends on the availability of a good reference implementation; in other words, you need to know what you are developing before you can automate it! The reference implementation for DSM should be available as a pair: the design data (a model), and produced output (code to be generated). Based on my experience, the generator definition process is usually test-case driven; you work backwards, starting with a reference implementation of the code to be generated. A good model has multiple alternative implementations; therefore, only locking down the details of the output allows you to know what exactly you want out of the generator. For example, in the XML generation example, the schema pretty much defines the whole scope for the generator output. In other situations, you might take the test case from an existing application or feature, or write it from scratch just for the needs of generation definition.
|The best practice is to ask the most experienced programmers to come up with the reference code.|
The best practice is to ask the most experienced programmers to come up with the reference code. It's important to ask them to write in the same style as they would like to teach other developers; otherwise the code may include too many special tricks for particular cases, rather than good standard code that can be generalized to all applications in that domain. Generating that code makes a good impression. Even if you later abandon the generator for some reason, you still end up with standardizable and generalizable expert code. Using code from experienced developers also speeds up the generator creation simply it requires less discussion about different coding practices and standards. The experts should rule here.
Having reference code proves that generated code can look familiar, follow the required programming mode, include appropriate comments, and follow the local standards for code style. I even know of cases of generating legislation and compliance information into the code to prove it satisfies customer requirements. Although later on, modelers generally don't need to look at the generated code, good-looking code creates confidence in the generative approach. Just like when buying a caryou want to see that there is a motor, but later on you don't want to bother to look under the hood. The structure of the generated code actually has a bigger impact on the person who takes care of the generator afterwards; generated code is easier to read and test cases made earlier can be applied when the generator is modified.
Having a reference implementation as a basis for the generator, or for just part of it, lets you ensure that the generator produces the expected result; otherwise, you change the generator. You may also find that it is useful to alter the modeling language or create some framework code to keep the code generator simple and enable better code generation. For instance, you should try to avoid cases where the code generator needs to check that the input it gets is correct. The modeling language rules should normally cover this already. Generally speaking, it is best to get the modeling language right early, because later, when there are many existing models, there may be some restrictions on what kinds of modeling language changes you can make. However, even at that stage it is still possible to make changes to the generators. The simpler your code generators are to start with, the simpler it will be to make changes to them later. Keeping the generators simple means you have to do less work to update them in the early days, when the modeling language is still evolving.
You can apply reference implementations as test cases during all phases of generator definition. You usually start from few typical structures to be generated, and then extend the generator bit by bit. For example, in the XML case you can start by choosing just a portion of the schema, and then gradually extend the generator to handle the entire schema. Similarly, during maintenance and enhancement of the generator, getting a test case beforehand makes generator definition simpler.
Good Generator Properties
|Model-based generators should target code directly instead of producing intermediate models that need to be extended during the development process.|
In my opinion, a good generator produces complete
code. This has been the cornerstone for automation and raising abstraction in the past. We would not be happy if after writing C and compiling we needed to modify and rework the created assembly language or machine code. For the same reason I personally have difficulties in understanding how OMG's MDA could work. The idea sounds attractivemaking a model that gets transformed to another set of models, which are then modified and transformed to still more models, and ultimately, to code. However, this approach leads to the same results as wizards: lots of code (and models) that you didn't write yourself, but that you are expected to maintain. Such wizards can sometimes be helpful, and they do offer increased productivity at the start, but over time creating mass of unfamiliar code that needs maintaining tarnishes the picture considerably. The MDA idea gets even worse when you consider round-tripping; would you like to update the manually made changes to the code and lower-level models back to all the higher-level models? Or, after you make a change to the top-level model, to successfully integrate your hand-made changes at the lower levels with new code caused by the top-level change? And if you wouldn't, would you trust a tool to get it right? That was not the success pattern we saw when assemblers were replaced with compilers; nobody tries to maintain their C code in both C and assembler.
I would also advocate that model-based generators should target code directly instead of producing intermediate models that need to be extended during the development process. Naturally, you may need to generate some supporting intermediate models, but the process usually breaks if the generated models need to be changed. So, if you face the situation of model-to-model transformations, do the following: look to see what kind of information gets added by modelers to the lower-level generated models and then extend the higher-level language to capture it. As a side benefit, you usually find a way to record the information into significantly fewer data elements in the higher-level language. Finally, merge the transformations into one. If you can do it, you make life easier for everybody. It is easy for the modelers because there's only one language, one model, and no round trip that might desynchronize the model and the result. It is also easier for the language developer because there's only one language, and a single one-way non-updating generator. Experience shows that a single-model approach that targets code worksand scales in larger teams too.
Code generation is not the only place for automation. The power of models increases even further if you can also generate things such as configuration data, test cases, simulation material, documentation, automated build processes, and so forth from the same source. Having a single source and multiple targets can be very beneficial, because when making modifications, developers need to make a change in only one place, and the tool takes care of the rest.
Not all code can be generated from models. Some parts will still be hand-written, and moved to domain framework components. However, when looking at things from the designer's perspective, models can be used to generate complete working code. The four cases presented here form concrete demonstrations of this from different application domains and from different generation requirements. The design rules of the examples depended on the application domain and the modeling languages were defined to give first-class support for specification work, making code generation, optimization, early error detection and correct reuse easier to achieve. On the generation side, the code produced is functional, readable, and efficientideally looking like code hand-written by the experienced developer who defined the generator.
If you look at the examples above, it would be hard to imagine how one single language or generator could have produced the correct code in each different case. General-purpose modeling languages such as UML are well suited for documentation, but not as well suited for generation. Code generation requires that details are correct too.
Recently, open and customizable technologies have emerged that allow developers to change both the design languages and/or code generators to meet different requirements of software development. Hence, experienced developers in a company can adapt the design languages and generators to a specific domain, and then model actual products using those domain-specific languages, which generate code directly from the models. Finally, it bears mentioning that for an expert, building model-based code generation is not only an interesting challengeit is also a lot of fun!