Domain-Specific Modeling: How to Start Defining Your Own Language

omain-specific languages (DSLs) are becoming more and more difficult to avoid. A growing number of vendors are announcing support for DSLs and, in the process, moving away from all-purpose UML. But why?

DSLs?also known as domain-specific modeling languages?are more expressive and therefore tackle complexity better, making modeling easier and more convenient. More importantly, they allow automatic, full code generation, similar to the way today’s compilers generate Assembler from another programming language.

Domain-specific modeling (DSM) puts models at the core of development, making it truly model-driven. Of key importance is the ability to define and maintain your own DSL. In my opinion, model-driven development works best if the language really is domain-specific: Ideally, the language is limited to just one company or even one development area in that company.

If there are no compromises in the syntax of your design language, there need to be no compromises in the way you generate artifacts from designs in that language. This is the true essence of model-driven development and the path to the biggest productivity and quality improvements. A carefully executed and narrowly defined DSL can reap an order of magnitude improvement in productivity and quality.

The problem that companies?or rather their expert developers?will face is how to come up with a suitable DSM language. It is very likely that defining a new modeling language is not part of their current skill-set. Most developers know that they can achieve the biggest benefits by raising the abstraction level of the modeling language. However when given the task to create such a language for the first time, it is relatively easy to end up with a language that reflects the code they normally write. Such a language does not raise the level of abstraction, and limits the possibility for increasing productivity with generators.

In this article, I will draw upon 15 years of experience in this field to provide some guidelines for creating an effective DSL. I will focus on modeling languages that go beyond documentation, targeting full production code generation from the modeler’s perspective.

You may be apt to believe that creating a modeling language must be a very difficult task. This may certainly be true when building an all-purpose modeling language. How do you create a modeling language if you do not know the type of applications it will model? It is very difficult indeed and you would probably end up with something like… UML, which, as we know is poorly suited for generating full production code.

The language definition task becomes considerably easier when the language needs to work only for one problem domain in one company. You can then focus on a restricted domain?the domain where you have been already working. Narrowing the scope of the language also helps to raise design abstraction, and makes it easier to create code generators to automate development.

Steps for Defining DSM
Over the years, I have found that the best way to build modeling languages is to build them incrementally: build a little of the language, model a little, define generators, make some changes to the language, model some more etc.

You can divide the process for implementing model-based development into four phases:

  1. Identifying abstractions and how they work together
  2. Specifying the language concepts and their rules (metamodel)
  3. Creating the visual representation of the language (notation)
  4. Defining the generators for model checking, code, documentation, etc.

Usually the process starts in this order too. Now I’ll look at each step in a bit more detail.

Figure 1. Physical Structure: This physical structure-based DSM language is used for modeling the hardware architecture in automobiles (based on EAST-ADL)

1. Abstractions: Do not think of models as visualizing code; abstractions are essential for the success of any DSL. The most important part is clearly finding the right abstractions. Although it can seem as if low-level language concepts (e.g. mapping a class diagram to a class structure) would be the easiest tactic, it is better to map concepts to the problem domain and thus raise the abstraction level. This way, you can help prevent errors early in the design phase, minimize specification work, and at the same time make the language more suitable for true generation. Describing things in problem domain terms instead of implementation concepts is also good future-proofing. In other words, what is important is what your application does, not how it does it or what language or framework it uses.

2. Language constructs: Obviously, you want your developers to follow the abstractions, obey architectural rules, reuse model components when appropriate, etc. Putting these abstractions and rules right in the language saves your developers from having to refer constantly to in-house “design guidelines” documents?which are probably out of date anyway. During this phase, you specify the modeling concepts, their properties, and the rules that constrain the use of the language and enforce model correctness. You will generally map the major domain concepts to modeling language objects, while other concepts will be captured as object properties, connections, sub-models, or links to models in other languages.

3. Notation: Next, you need a visual representation of the language, usually a diagram, but sometimes a matrix, table, or plain text. The pictures in this article illustrate representations of different graphical modeling languages where the symbols and icons inside the pictures represent different language concepts. Good DSM tools allow you to define your own notation as it makes models much easier to create, read, and maintain. Using UML-like rectangles for all the different concepts is like trying to understand a foreign language where the only letter is A, with 20 slight variations of inflection!

4. Generators: Ultimately, you want to transform your models into code for interpretation or compilation into an executable. Building a generator is about defining how model concepts are mapped to code or other output. In the simplest case, each modeling symbol produces certain fixed code that includes the values entered into the symbol as arguments by the modeler. Generally, generators also take relationships with other symbols or other model information into account

Figure 2. The Look and Feel Language: This look and feel language is based on S60 Symbian phones and PDAs.

Naturally, we need tools that not only provide the editors for modeling with DSM languages, but also support language and generator definition, that share new language versions with developers, update existing models based on language change, etc. Fortunately, these tools already exist, ranging from code frameworks through research tools to commercial environments, with increasing cost corresponding to decreasing work needed to build the language. A list of these tools is available at www.dsmforum.org/tools.html. Using a tool that already offers these features, an experienced developer in a company can thus tailor design languages and generators to a specific domain. Other developers can then design with the resulting DSM languages and tools, and generate actual products from their models.

This article will focus specifically on DSM language creation?in essence, the first two steps. The code generator is of course tightly related to your language and defining it is an interesting topic that I’ll cover in a future article.

Finding Abstractions and Language Concepts
The goal of defining a domain-specific language is to provide the software modelers and developers with a higher-level language with which they can build systems. When identifying the language concepts, it is of key importance to focus on a narrow application domain and your actual needs for it, knowing that you can change the language when your requirements change. This support for language evolution is essential when it comes to making a choice of what DSM tool to choose. Good tools allow such evolution, automatically updating all the models created previously with the language, whereas with less mature tools all your models are lost.

I recommend that developers who have developed several similar products before in the company’s problem domain or that may have been responsible for forming the component library or framework to be the ones assigned to creating the DSM language. These people are more familiar with the company’s problem domain and will therefore find it easier to identify the modeling concepts and associated rules.

Because every domain differs from another, the language concepts and abstractions between languages differ too. The best places to find your language concepts are the terminology used in your domain, system architecture, existing system descriptions, and component services. In other words, you should borrow from the domain-specific jargon or vocabulary used in your organization. This vocabulary provides you with natural concepts that describe your industry in ways that people already understand; people do not think of solutions in coding terms. Starting from the existing vocabulary also means that there is no need to introduce a new, unfamiliar set of terms, or create a mapping between two sets of terms.

In my experience, creation of a domain-specific modeling language often starts from a certain viewpoint on the domain. Language concepts are then chosen, or their identification is started, based on:

  • Physical product structure
  • Look and feel of the system
  • Variability space
  • Domain (expert) concepts
  • Generation output

I will discuss each category in more detail and provide some examples.

Physical Structure
Physical structures are easy to identify and clearly defined, thus making them a good starting point for language definition. In the case of a power plant or paper factory you will have concepts like valves, motors, sensors, and controls. A valve will have attributes such as ‘size’ and ‘direction’, and rules on how it may relate to motors and sensors. Your DSM language should use these same concepts directly as language constructs.

Figure 3. Spectrum of Variability: Figure 3 shows how the spectrum of variability method shows the range of decisions and features encompassed by the language syntax. (Courtesy of “Generative Programming, Methods, Tools, and Applications,” Czarnecki & Eisenecker, Addison-Weseley 2000.)

Languages based on physical structures usually focus on static declarative modes but may also include behavioral elements. Designs in such a language usually provide configuration data for the rest of the generation process and are usually linked to other models in order to achieve more comprehensive code generation.

Figure 1 illustrates part of a physical structure-based DSM language, based on EAST-ADL, a DSL that focuses in part on describing the hardware architecture in automobiles. The figure shows the architecture for electronic control units (ECU) with processors and memories connected through a CAN bus. The language provides several alternatives for bus types and constraints on the application of buses in the described hardware architecture.

Look and Feel of the System
You can also define a language based on the viewpoint of its end users’ navigation and product use. I call this approach basing your language on the “look and feel” of your product or system, although it may include any kind of perception or interaction with the system. A language for defining voice menus can include concepts like ‘menu,’ ‘prompt,’ and ‘voice entry’ as well as guidelines on how these may be linked to achieve user navigation. This type of language is quite easy to build and test, as it has “visible” counterparts in the actual product. The main challenge is in finding the mapping to other non-GUI concepts.

Figure 2 shows an application design in a language based on look and feel. The language targets Symbian-based cellular phones and PDAs, where it allows definition of the behavioral logic of the applications. Look and feel are represented here by using the actual user interface widgets of the phone as well as the services provided by its platform, like the sending of an SMS message or connecting to the web. If you are familiar with using a phone book or calendar on a mobile device, then you will most likely understand what the application does, just by looking at this single model.

Variability
Focusing on variability is another approach to start defining a language: you define the language so that variability options are captured by the language concepts, and the modeler’s role is to concentrate on the areas that differ between different products or features.

Figure 4. The Domain Expert Method: Modeling financial and insurance products for a J2EE web app with a DSM language based on domain expert concepts.

Success in defining these types of languages depends largely on your ability to predict what kind of variation space is needed in future variants. Variability languages are suitable for product-line development, where you often find them. Preparing for language definition comes down to conducting a thorough domain analysis: identifying which abstractions are the same for all products and which are different. The task of the language is then to describe just that which can be different. Static variation is usually easy to cope with?developers have been making parameter tables and wizards to choose among alternatives for decades.

Things get more complicated when the parameter choices depend on other parameter choices. In practice, parameter and feature choice approaches usually break down when you also require the ability to create new features, functionality, and variants. DSM offers a solution that supports situations where not all possible product features have been decided upon.

Figure 3 illustrates the spectrum of variability. Wizards and feature-based configuration focus on making choices among known decisions and features. DSM languages do not set choices explicitly but give a practically infinite space to set variation. You do not know all variants, as they can be numerous. More importantly, you can define the language so that it allows you to make new features as well. Naturally, the language should then constrain the modeler to making only legal features and products.

Domain Experts’ Concepts
Because domain experts?e.g. test or commissioning engineers, configurators, and service creators?are usually not programmers, a language for them to program with needs to raise the level of abstraction far beyond programming concepts. Languages that are based on domain experts’ concepts are relatively easy to define because for an expert to exist, the domain must already have established semantics. You can derive many of the modeling concepts directly from the domain model. The same holds true for some of the constraints.

Figure 5. IP Processing: Modeling call processing in IP telephony, a DSM language based on its XML generation target.

Figure 4 shows a language based on domain experts’ concepts. For this particular language the modeling concepts are related to financial and insurance products. Concepts like ‘Risk,’ ‘Bonus,’ and ‘Damage’ capture relevant facts about insurances. Using this language an insurance expert, and thus a non-programmer, draws models to define different insurance products. Generators take care of transforming their designs into code for a J2EE web application. This way the expert Java programmer can build the mapping from the language to code once, and neither he nor the insurance experts needs to know the intricacies of the others’ area of expertise. The higher abstraction in models using domain experts’ concepts also means that the generated output can be easily changed to some other implementation language.

Generated Output
The fifth and last category languages fall into is based on the generation target of the language. While these languages are easy to build, their ability to increase productivity and quality is questionable. I see very little point in modeling a program class, depicted as a rectangle and then editing the details of that same class in a text file. I do recommend this type of language when the generated output is already in a domain-specific language. The best example I can give is those cases that target generation in a particular XML format. The XML schema provides a wealth of information for identifying the language concepts and constraints. To follow the XML metaphor, designs can be considered valid and well-formed right at the modeling stage. Graphical models can also help overcome many of the limitations of XML.

An example of this kind of DSM language is the Call Processing Language (CPL), which is used to describe and control Internet telephony services (see Figure 5). The language constructs include ‘proxy,’ ‘location,’ and ‘signaling actions,’ essential for specifying IP telephony servers. These same concepts are already defined as elements in XML, and the property values of the modeling constructs are attributes of the XML elements. Having generators produce the configuration in XML gives significant and obvious productivity and quality improvements. With the language illustrated in Figure 5 it is far more difficult to design services that are erroneous or internally inconsistent: something that is all too easy in hand-written CPL/XML.

Finalizing your Language Spec
You finalize your language by formalizing it into a metamodel. The form of the metamodel depends on the DSM tool you use, but at a minimum it should allow you to define the concepts of your language, their properties, the legal connections between elements of your language, model hierarchy structures, and correctness rules. In all but the smallest cases, support for reuse and different model integration approaches is also essential.

Metamodeling simply means modeling your language: mapping your domain concepts to various language elements, such as objects, their properties, and their connections, specified as relationships, and the roles that objects play in them. You will find you can specify some of the language concepts directly and others by combining some domain concepts. In making a decision about which concepts to include, it helps to use your language and generate artifacts from it. Therefore it is best to try out your language immediately after you have defined some of the concepts. Here tools can help you, as ideally they should allow you to focus on language definition only and provide various modeling editors for your language instantly and automatically. This makes language creation agile: you can easily test and learn what the language looks like in practice, how easily it allows you to make and reuse models etc. This really minimizes the risks of making a bad language, or a good language but for the wrong task, and greatly helps in finding good mappings for code generation.

Keeping the language running
Changes to your DSM language will be more norm than exception. Your understanding of a domain, even if you are an expert in it, will improve while you define a language for it. Even after you take your language into use, your understanding of your domain will improve through modeling or from getting feedback from other users of the model. Partly you will understand the domain better, and partly you will see possible improvements for your language.

I believe it is vital for any DSM language creator to be able to make and test a language quickly. You need to be able to focus on making a good language and not be distracted by details of how you need to implement it in a DSM tool. Best is if you can test your language the instant you define it. This makes language creation agile and has a major influence on how quickly you can build the language. More importantly, it improves the quality of your language, and any improvement there will be multiplied many times when developers start using the language.

Giving the language to candidate users early gets them involved and ensures you get feedback early. Just like your domain, your language will evolve over time and you need to be able to adjust it whenever you need to. The moment your language stops evolving is the moment it starts to lose its usefulness.

Freedom of Choice
Defining a whole new language is often considered a difficult task. However, once you realize that it means you get to apply the knowledge you already have about your domain you will perceive this task as considerably easier. DSM means you no longer have to force your application or system designs into several prescribed diagrams or a “de-facto” syntax that does not suit your design requirements, and leaves models as no more than mere documentation. DSM gives you the freedom of choice regarding what design language fits your requirements best at what time.

Besides providing a real opportunity to raise the level of abstraction that developers work on, DSM also allows you, the language definer, to encapsulate your expert knowledge about your evolving domain in the language, so that others automatically follow best practices without having to remember them.

When you give your team a good DSM language, they will become faster and automatically follow best practices. They will not have to learn a third party one-size-fits-all “standard” notation that offers 700 additional concepts to the 100 they might use. Instead, they get to use one that reflects the concepts and rules of the domain they know and already work with. Perhaps most importantly you can move your models from being throwaway documentation of one stage of the design process, to being actual executable specifications. That last part requires defining your code generators, the topic of my next article on DevX.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist