RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Building Domain Specific Languages in C#

Domain specific languages have been around since Lisp, and abound in the Unix world of "little languages." A convergence of research has recently brought domain specific languages to the forefront of both language and API design.

t the JAOO conference in Aarhus, Denmark this year, domain specific languages came up in virtually every conversation; every keynote mentioned them, a lot of sessions discussed them (including a pre-conference workshop by Martin Fowler and myself), and you could hear "DSL" in most of the hallway conversations. Why this, and why now?

To hear some people talk, you'd think DSLs solve world hunger, cure cancer, and make software write itself (perhaps this is a bit of an exaggeration). DSLs are really nothing more than abstraction mechanisms. The current interest lies in the dawning realization that some abstractions resist easy representation in modern languages such as C#.

For the last 20 years or so, developers have used objects as their primary abstraction mechanism. Objects work well because much of the world is hierarchical. But edge cases still pop up. For example, consider the problem of querying relational data in a way that fits the object paradigm nicely. LINQ provides an elegant solution to that problem—and it's a DSL, one for building structured data queries in a way that fits in nicely with C#.

A more formal definition appears later, but, for now, a working definition for a Domain Specific Language is that it's a computer language limited to a very specific problem domain. In essence, a DSL is an abstraction mechanism that allows very concise representations of complex data. This article covers some definitions of what constitutes a DSL, what kinds of DSLs exist, and how to build a particular type of DSL known as a fluent interface. First, though, here are some term definitions.

Does Starbucks Use a DSL?

DSLs use language as an abstraction mechanism the way that objects use hierarchy. One challenge when talking about an abstraction mechanism as flexible as language lies in defining it. The thing that makes a DSL such a compelling abstraction is the parallel with a common human communication technique, jargon. Here's a pop quiz. What "languages" are these?

  • Venti, half-caf no foam latte with whip
  • Scattered, smothered, and covered
  • Just before the Tea Interval, the batsman was out LBW.
The first is easy: Starbucks. The second you probably know only if you eat at a Waffle House regularly: it's the way you order hash browns. (The Waffle House hash brown language consists of eight keywords, all transitive verbs: scattered, smothered, covered, chunked, topped, diced, peppered, and capped). I hear something similar to the third example often because I have lots of colleagues who play cricket—but the language makes no sense to me. And that's really the point: people create jargon as a short-hand way to convey large amounts of information succinctly. Consider the Waffle House example. Here's an alternative way to order hash browns:

There is a plant called a potato, which is a tuber, a root plant. Take the potato, harvest it, wash it off, and chop it into little cubes. Put those in a pan with oil and fry them until they are just turning brown, and then drain the grease away and put them on a plate. OK, next I want cheese. There's an animal called a cow...

Don't ever try to order hash browns like this in a Waffle House because the person who's next in line will kill you. All these examples represent jargon; common abbreviated ways that people talk. You could consider the Waffle House hash brown language as a domain specific language (after all, it is a language specific to a domain), but doing so leads to a slippery slope where everything is a DSL. Thus, I'm going to lean on Martin Fowler's definition of a DSL, extracted from the book on DSL patterns he's writing.
A domain specific language is a limited form of computer language designed for a specific class of problems.
He adds another related definition to this:
Language-oriented programming is a general style of development which operates about the idea of building software around a set of domain specific languages.
With this definition in hand, I'll limit the scope of DSLs as a software terminology to keep it inside reasonable bounds.

Why use language as an abstraction mechanism? It allows you to leverage one of the key features of jargon. Consider the elaborate Waffle House example above. You don't talk like that to people because people already understand context, which is one of the important aspects of DSLs. When you think about writing code as speech, it is more like the context-free version above. Considering ordering coffee using C#:

   Latte coffee = new Latte();
   coffee.Size = Size.VENTI;
   coffee.Whip = true;
   coffee.Decaf = DecafLimit.HALF;
   coffee.Foam = false;
Compare that to the more human way of ordering in the example above. DSLs allow use to leverage implicit context in our code. Think about all the LINQ examples you've seen. One of its nicest features is the concise syntax, eliminating the interfering noise of the underlying APIs that it calls.

Two types of DSLs exist (again borrowing Martin's terms): internal and external. Internal DSLs are little languages built on top of another underlying language. LINQ is a good example of an internal DSL, because the LINQ syntax you use is legal C# syntax that controls an extended (domain specific) syntax. In contrast, an external DSL describes a language created with a lexer and parser, where you create your own syntax. SQL is a good example: Someone had to write a grammar for SQL, and a way to interpret that grammar into some other executable code. Lexers and parsers make most people flee in fear, so this article focuses instead on the surprisingly rich environment of internal DSLs.

Fluent Interfaces

A fluent interface is just "regular" code, written in a way that eliminates extra syntax and creates sentences. In spoken languages, a sentence is a complete unit of thought. Fluent interfaces try to achieve the same effect by clever syntax use. For example, consider this version of the coffee API shown above:

   Latte coffee = Latte.describedAs
   .Venti .whip .halfCaf .foam;
This description is almost as concise as the English version, yet it is valid C# code, with some creative indentation. Notice how fluent interfaces try to create a single unit of thought. In the API version above, you only know that you are finished defining a particular cup of coffee by the context of the code; you're finished when the code switches to another object of some kind. By contrast, the fluent interface version is a complete unit of thought. In spoken languages, you use a period to indicate a complete unit of thought. In the fluent interface, the semi-colon is the marker that terminates the complete unit of thought.

Why would you create a fluent interface like this one? After all, for developers, the API version seems reasonably readable. However, a non-developer would have a hard time reading the API. As an example, one recent project dealt with leasing rail cars. The rail industry has elaborate rules about uses for certain types of cars. For example, if you normally haul milk in a tanker car, if you ever haul tar in that car, you can no longer legally carry milk in it. The project had elaborate test-case setups (sometimes running to several pages of code), to make sure that we were testing the right car characteristics. At one point, we showed analysts code that looked like this:

   ICar car = new Car();
   IMarketingDescription desc = new 
   desc.Type = "Box";
   desc.SubType = "Insulated";
   desc.Length = 50;
   desc.Ladder = true;
   desc.LiningType = LiningType.CORK;
   car.Description = desc;
However, the business analysts pushed back on this. While the code is perfectly readable to developers, the analysts had a hard time ignoring the noise introduced by the necessary code artifacts. To make it easier for them to read, we rewrote the code into a fluent interface:

   ICar car = Car.describedAs.
The business analysts found this so readable that we no longer had to translate the meaning for them. This saved time but—more importantly—it prevented translation errors that caused us to waste time testing the wrong type of characteristics. The goal of fluent interfaces is not to get code to the point where non-technical people can write the code, but to the point where they can read the code; it's one less level of mismatch between developers and everyone else.

The rest of this article describes an example of creating a fluent interface in C# using syntax similar to the simple example above but fleshed out with implementation details.

Editor's Note: This article was first published in the January/February 2009 issue of CoDe Magazine, and is reprinted here by permission.

Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date