hy do people use computers? We use computers because they make us more productive. Using a word processor is more efficient than writing a manuscript in pencil. An electronic spreadsheet is more efficient, accurate, and valuable than its paper counterpart. As engineers we build time-saving applications for others but never think to apply the power of computers to our own problems.
Code generation is a time-saving technique that helps engineers do better, more creative, and useful work by reducing redundant hand-coding. In this world of increasingly code-intensive frameworks, the value of replacing laborious hand-coding with code generation is acute and, thus, its popularity is increasing.
EJB is an excellent example of a complex and code-intensive framework. Supporting a single database table in an EJB framework requires building up to seven different classes and interfaces. And of course, even a simple database application today has 20 or more tables. The coding work adds up fast. EJB’s infrastructure code is laborious scaffolding-style code where the best you can hope for is to not screw it up. It’s repetitive and therefore difficult to maintain. It’s not code where you can excel or that requires any creativity to write. For these reasons alone EJB code should always be built using an active generator.
Dealing with Skepticism
Almost invariably, when code generation is introduced for the first time to a development group, opposition flares up. That’s not a bad thing. Skepticism is a valuable trait in engineers and new approaches and “silver bullets” should always be subjected to that skepticism. So let’s start by addressing some of the well-founded concerns about code generation.
Code generation is copy-and-paste coding.
Code that is copied and pasted to multiple places is difficult to maintain properly across all of the copies. Active code generation does not suffer from the same maintainability issues as copy-and-paste coding. When you need fix something, you apply the bug fix to the templates used to generate the code, which then propagates the fix to all of the code maintained by the generator. This design ensures that no code that needs fixing is left scattered around and forgotten.
Generating all of the code now means I won’t have a job tomorrow.
Code generation isn’t a magic wand to build excellent software. It’s a tool that gets you where you want to go faster. Which in turn means that you will be able to solve more bugs and implement more new features than you would by hand-coding.
Code generators follow the 80/20 rule. They solve most of the problems, but not all of the problems. There are always features and edge cases that will need hand-coding. Even if code generation could build 100 percent of the application, there will still be an endless supply of boring meetings about feature design.
Generating code is a design smell.
For those not in on the new lingo, a design smell is a heuristic experience-based signal to an engineer that bad decisions about the logical or physical design of the system have been made upstream from the implementation. For example, when a lot of code needs to be written to perform relatively simple tasks, engineers can smell bad design. The EJB platform requires the construction of a bunch of classes and interfaces for each database table. Some consider this a design smell.
The choice of platform is often not up to the engineer on the front lines. If EJB is the chosen platform, and it often is, then you need EJB code to complete the project. What is your choice is how you build the code. You can write the code by hand or generate it. Particularly on EJB projects, generation will get you complete code quickly.
Nobody will understand the generator but me.
You should get the generator started on the right foot by getting all of the stakeholders?the people who will use the generator?involved in the design and deployment. Once the generator has been built, hold a brown bag lunch to familiarize everyone on the team with the generator. Explain the design of the generator and show how it’s used.
If there are several teams you may want to hold a different brown bag to show the architecture of the generator and also to brainstorm other uses of the generators. For example, you may come up with great ideas on how to use multiple generators together, how to generate build processes, or how to generate documentation.
Documentation of the generators itself is critical to widespread acceptance. You need introductory documentation for new engineers and in-depth documentation for those maintaining the generator. Even if you use an off-the-shelf generator such as XDoclet or UML2EJB you still have to write up documentation for how the generator is configured for your environment.
Engineers will start disregarding the ‘do not edit’ comments and eventually the generator will fall out of use.
Generated code should have comments that instruct engineers about how and when to edit the generated code. In most cases the code should never be edited directly and the code comments should warn about that prominently. However, sometimes these comments are ignored and this is a symptom of an issue with the deployment of the generator.
You can mitigate the problem in two ways. The first is accurate and well-maintained documentation that focuses on what the generator does and how it is used. The other is the usability of the generator itself. Generators need to have reasonably friendly command-line interfaces that report intelligent errors and positively report success. If possible, the generator should be integrated into the IDE or the automated build process when appropriate.
It comes down to a team dynamic. If you have a team that is supportive of the generator and maintains it properly, you wont have issues. If the generator is maintained by one guy that nobody talks to because he sits in a corner, you’ll have problems.
My schema is too complex; a generator will never be able to handle all of the edge cases.
Complex legacy schemas are always a problem to code around, regardless of technique. A complex schema is not made more difficult by using XDoclet because the generator is handling the infrastructural EJB code that is mainly repetitive grunt work to write. Generators that take on larger tasks, such as UML2EJB, which builds the entire EJB tier, will need customization to handle the edge cases.
I recommend building a few EJBs to represent the various types of tables and relationships in your schema. This will give you a feel for what you would like to see as output from a generator. Then use a couple of different generators to see which one works for your project and will produce code that matches your design and coding standards.
The generator will limit our creativity.
The primary role of a generator is to offload mundane work from engineers so they can concentrate on creative design and implementation work. If your idea of creative coding is hand-building cookie cutter EJBs in a Swiss watchmaker style, perhaps your creativity would be constrained by a generator. On the other hand, if your creativity centers around shipping quality features to customers quickly and being able to move in an agile fashion to match changing requirements, then code generation is going to make you more creative, not less.
I’ll push a button and something will make lots of code that I don’t understand.
In The Pragmatic Programmer (Addison-Wesley, 1999), Dave Thomas and Andrew Hunt say “You should never ship code you don’t understand.” I couldn’t agree more. There is a problem with off-the-shelf generators that build code for you using templates that come with the product. You should inspect every line of the code that a generator builds when you do your product evaluation.
My project isn’t ready for a generator.
If you are ready to code you are ready to generate code. The ability to quickly generate code to implement your database backend means that you can complete iterative cycles of development more quickly than you could with hand-coding. This allows engineers to prototype earlier and more often, which flushes out design issues early when they are cheaper to fix.
The Advantages of Code Generation
Having addressed the issues engineers have with generators, let’s talk about the positives behind generation?the things that show the value of a generated code base over its hand-coded equivalent.
The quality of generated source code is directly related to the quality of the templates. This is a big advantage because as you increase the quality of the templates over time you will also increase the quality of the entire code base. You can start out with a relatively lightweight pass at the templates and then make them more robust in an iterative process as your gain understanding of the framework and proper error handling. Contrast this process with hand-written code, which will have unreliable quality over time as interest in the project ebbs and flows.
There’s another quality advantage, too. Let’s face it, bug fixing is always about effort vs. reward. Fixing 300 hand-coded classes one by one is a lot harder than fixing one template and generating the 300 classes again. Systemic bugs are much more likely to get fixed in generated code bases. In addition, optional enhancements to increase the stability or functionality of the code base are much more likely to happen. For all of these reasons, a generated code base is much more agile and robust than its hand-coded equivalent.
Generated code bases are extremely consistent in interface definition. This make it easier to hand-write code on top of generated APIs. It also makes it easier to build other layers of generated code on top of them.
The advantage of componentization and APIs is to create complexity barriers. The interior of a component may be extremely complex but its interface should present the abstraction of the component in as simple and consistent a manner as possible. Code generation does not guarantee that a set function will set things and that a get will get things, but it does guarantee that all the sets and gets will work the same way. This means fewer surprises for the engineers, which equates to more productivity and fewer bugs.
Once code is hand-written by an engineer he turns his attention to something else and the old code starts to atrophy until someone gives it some more attention. Generated code is continuously maintained by the generator. Every generation cycle replaces the entire generated code base with fresh code. When bugs are fixed in the templates they are propagated across all of the code consistently. So the output code is always better maintained than its hand-coded equivalent.
The productivity of the engineering group is unquestionably enhanced by code generation. On its face a generator that builds hundreds of cookie cutter EJBs in seconds will outperform any human attempting the same task. But that’s just the initial productivity increase.
The more important productivity increases are tied to the morale improvement you’ll see in the engineers working on the project. They’ll be happier because they will be making much more intelligent use of their time. Working smarter rather than harder is a clich