Building Domain Specific Languages in C#

Building Domain Specific Languages in C#

t the JAOO conference in Aarhus, Denmark this year, domain specific languages came up in virtually every conversation; every keynote mentioned them, a lot of sessions discussed them (including a pre-conference workshop by Martin Fowler and myself), and you could hear “DSL” in most of the hallway conversations. Why this, and why now?

To hear some people talk, you’d think DSLs solve world hunger, cure cancer, and make software write itself (perhaps this is a bit of an exaggeration). DSLs are really nothing more than abstraction mechanisms. The current interest lies in the dawning realization that some abstractions resist easy representation in modern languages such as C#.

For the last 20 years or so, developers have used objects as their primary abstraction mechanism. Objects work well because much of the world is hierarchical. But edge cases still pop up. For example, consider the problem of querying relational data in a way that fits the object paradigm nicely. LINQ provides an elegant solution to that problem?and it’s a DSL, one for building structured data queries in a way that fits in nicely with C#.

A more formal definition appears later, but, for now, a working definition for a Domain Specific Language is that it’s a computer language limited to a very specific problem domain. In essence, a DSL is an abstraction mechanism that allows very concise representations of complex data. This article covers some definitions of what constitutes a DSL, what kinds of DSLs exist, and how to build a particular type of DSL known as a fluent interface. First, though, here are some term definitions.

Does Starbucks Use a DSL?

DSLs use language as an abstraction mechanism the way that objects use hierarchy. One challenge when talking about an abstraction mechanism as flexible as language lies in defining it. The thing that makes a DSL such a compelling abstraction is the parallel with a common human communication technique, jargon. Here’s a pop quiz. What “languages” are these?

  • Venti, half-caf no foam latte with whip
  • Scattered, smothered, and covered
  • Just before the Tea Interval, the batsman was out LBW.

The first is easy: Starbucks. The second you probably know only if you eat at a Waffle House regularly: it’s the way you order hash browns. (The Waffle House hash brown language consists of eight keywords, all transitive verbs: scattered, smothered, covered, chunked, topped, diced, peppered, and capped). I hear something similar to the third example often because I have lots of colleagues who play cricket?but the language makes no sense to me. And that’s really the point: people create jargon as a short-hand way to convey large amounts of information succinctly. Consider the Waffle House example. Here’s an alternative way to order hash browns:

There is a plant called a potato, which is a tuber, a root plant. Take the potato, harvest it, wash it off, and chop it into little cubes. Put those in a pan with oil and fry them until they are just turning brown, and then drain the grease away and put them on a plate. OK, next I want cheese. There’s an animal called a cow…

Don’t ever try to order hash browns like this in a Waffle House because the person who’s next in line will kill you. All these examples represent jargon; common abbreviated ways that people talk. You could consider the Waffle House hash brown language as a domain specific language (after all, it is a language specific to a domain), but doing so leads to a slippery slope where everything is a DSL. Thus, I’m going to lean on Martin Fowler’s definition of a DSL, extracted from the book on DSL patterns he’s writing.

A domain specific language is a limited form of computer language designed for a specific class of problems.

He adds another related definition to this:

Language-oriented programming is a general style of development which operates about the idea of building software around a set of domain specific languages.

With this definition in hand, I’ll limit the scope of DSLs as a software terminology to keep it inside reasonable bounds.

Why use language as an abstraction mechanism? It allows you to leverage one of the key features of jargon. Consider the elaborate Waffle House example above. You don’t talk like that to people because people already understand context, which is one of the important aspects of DSLs. When you think about writing code as speech, it is more like the context-free version above. Considering ordering coffee using C#:

   Latte coffee = new Latte();   coffee.Size = Size.VENTI;   coffee.Whip = true;   coffee.Decaf = DecafLimit.HALF;   coffee.Foam = false;

Compare that to the more human way of ordering in the example above. DSLs allow use to leverage implicit context in our code. Think about all the LINQ examples you’ve seen. One of its nicest features is the concise syntax, eliminating the interfering noise of the underlying APIs that it calls.

Two types of DSLs exist (again borrowing Martin’s terms): internal and external. Internal DSLs are little languages built on top of another underlying language. LINQ is a good example of an internal DSL, because the LINQ syntax you use is legal C# syntax that controls an extended (domain specific) syntax. In contrast, an external DSL describes a language created with a lexer and parser, where you create your own syntax. SQL is a good example: Someone had to write a grammar for SQL, and a way to interpret that grammar into some other executable code. Lexers and parsers make most people flee in fear, so this article focuses instead on the surprisingly rich environment of internal DSLs.

Fluent Interfaces

A fluent interface is just “regular” code, written in a way that eliminates extra syntax and creates sentences. In spoken languages, a sentence is a complete unit of thought. Fluent interfaces try to achieve the same effect by clever syntax use. For example, consider this version of the coffee API shown above:

   Latte coffee = Latte.describedAs   .Venti .whip .halfCaf .foam;

This description is almost as concise as the English version, yet it is valid C# code, with some creative indentation. Notice how fluent interfaces try to create a single unit of thought. In the API version above, you only know that you are finished defining a particular cup of coffee by the context of the code; you’re finished when the code switches to another object of some kind. By contrast, the fluent interface version is a complete unit of thought. In spoken languages, you use a period to indicate a complete unit of thought. In the fluent interface, the semi-colon is the marker that terminates the complete unit of thought.

Why would you create a fluent interface like this one? After all, for developers, the API version seems reasonably readable. However, a non-developer would have a hard time reading the API. As an example, one recent project dealt with leasing rail cars. The rail industry has elaborate rules about uses for certain types of cars. For example, if you normally haul milk in a tanker car, if you ever haul tar in that car, you can no longer legally carry milk in it. The project had elaborate test-case setups (sometimes running to several pages of code), to make sure that we were testing the right car characteristics. At one point, we showed analysts code that looked like this:

   ICar car = new Car();   IMarketingDescription desc = new    MarketingDescription();   desc.Type = "Box";   desc.SubType = "Insulated";   desc.Length = 50;   desc.Ladder = true;   desc.LiningType = LiningType.CORK;   car.Description = desc;

However, the business analysts pushed back on this. While the code is perfectly readable to developers, the analysts had a hard time ignoring the noise introduced by the necessary code artifacts. To make it easier for them to read, we rewrote the code into a fluent interface:

   ICar car = Car.describedAs.    .Box    .Insulated    .Length(50)         .Includes(Equipment.LADDER)    .Has(Lining.CORK);

The business analysts found this so readable that we no longer had to translate the meaning for them. This saved time but?more importantly?it prevented translation errors that caused us to waste time testing the wrong type of characteristics. The goal of fluent interfaces is not to get code to the point where non-technical people can write the code, but to the point where they can read the code; it’s one less level of mismatch between developers and everyone else.

The rest of this article describes an example of creating a fluent interface in C# using syntax similar to the simple example above but fleshed out with implementation details.

Editor’s Note: This article was first published in the January/February 2009 issue of CoDe Magazine, and is reprinted here by permission.

A .NET Bakery DSL

Suppose you run a bakery, and are in cutthroat competition with the bakery across the street. To remain competitive, you must have flexible pricing rules for assets such as day-old bread (because every time you change your prices, the guys across the street do too). The driving force is flexible business rules.

To that end, you create the idea of discount rules based on customer profiles. The profiles describe customers, and you offer base discount incentives on those profiles. You need to be able to define and redefine these rules at the drop of a hat. To solve the problem, you can compose a couple of DSLs.

A Profile Fluent Interface

The first DSL addresses Profiles. Here’s a unit test for the syntax I want:

   [Test]   public void simple_profile()   {   Profile p = new Profile();   p.member            .frequency(5)   .monthlySpending(100);   Assert.IsTrue(p.Member);   Assert.AreEqual(p.Frequency, 5);         Assert.AreEqual(p.MonthlySpending, 100);   }

The source for the Profile class appears in Listing 1.

The Profile class uses a DSL technique called member chaining. Member chaining refers to methods and/or properties created with sentence composability in mind. From a C# standpoint, it’s as simple as creating properties and methods that return the host object, or this, rather than a more typical return value. This example plays simple games with case: The member chained methods start with lower case letters (having capital letters embedded within sentences would look a little odd) and the “normal” properties use the standard convention. Using syntactic tricks like this is fairly common in the DSL world; it’s a technique to bend the language to make it more readable.

Ideally, you want to remove as much syntactic noise as possible, yet a little bit still lurks in the constructor invocation: You must create a new Profile object before allowing the fluent methods to engage. One way to solve this is to create a static factory method on the class that serves as the first part of the chain. In Listing 1, that’s the describedAs method within Profile :

   static public Profile describedAs   {      get      {         return new Profile();      }   }

The factory method allows the consumption of the fluent interface to be more graceful, as illustrated in the unit test shown below:

   [Test]   public void factory_constructor()   {      Profile p =         Profile.describedAs            .member            .frequency(20)            .monthlySpending(150);      Assert.IsTrue(p.Member);      Assert.AreEqual(p.Frequency, 20);      Assert.AreEqual(p.MonthlySpending, 150);   }

Notice that using chained properties violates a common rule of properties, the “Command Query Rule,” which says that get properties shouldn’t make any modifications to the underlying object. However, to make this style of DSL work, you need to ignore the Command Query rule and allow get properties to set an internal field value and still return this.

A Discount Fluent Interface

From a technical standpoint, the Discount implementation looks pretty much like Profile, so I won’t show most of the code. Here’s the unit test that demonstrates the use of the class:

   [Test]   public void get_discount_based_on_profile()   {      Discount discount = Discount         .basedOn(Profile.describedAs            .member            .monthlySpending(100)            .frequency(20))         .forMembership(10.0)         .forMonthlySpending(75, 5.0)         .forNumberOfVisits(10, 10.0);      Assert.AreEqual(25.0, discount.DiscountAmount);   }

In this example, the Discount class relies on the Profile (created via the fluent interface described above). The other part of the Discount class sets threshold values based on the Profile, determining the amount of the discount for this profile. The implementation of the forXXX methods simply sets an internal value and then returns this which enables fluent interface invocation. Here are the methods:

   public Discount forMembership(double discount)   {      _discountForMembership = discount;      return this;   }      public Discount forNumberOfVisits(int numOfVisits, double discount)   {      _numberOfVisits = numOfVisits;      _discountForVisits = discount;      return this;   }      public Discount forMonthlySpending(int monthlySpending,      double discount)   {      _monthlySpending = monthlySpending;      _discountForMonthlySpending = discount;      return this;   }

The only other interesting part of Discount is the DiscountAmount property, which applies the discount rules to determine an overall discount percentage:

   public double DiscountAmount   {      get      {         var discount = 0.0;         if (_profile.Member)            discount += _discountForMembership;         if (_profile.Frequency > _numberOfVisits)            discount += _discountForVisits;         if (_profile.MonthlySpending >_monthlySpending)            discount += _discountForMonthlySpending;         return discount;      }   }

The Rule List Class

The last piece is the RuleListChained class that builds lists of Discount rules (see Listing 2).

The only chained method in the class is the addDiscount() method. The unit test (see Listing 3) shows how all the pieces fit together.

Notice how concise the code is. Yes, it looks a little odd if you are primarily used to looking at C# code, but when read from the viewpoint of a non-technical person, there is very little syntactic “cruft.”

If you run the test you can see that you do indeed have a discount list. However, a lurking problem exists. Suppose you want to save the rules in a database during the add operation, or even just print out the rule as you add it. Here’s a modified version of the addDiscount() method that does just that:

   public Discount addDiscount()   {      var discount = new Discount();      _ruleList.Add(discount);      Console.WriteLine(discount);      return discount;   }

But the results are surprising, as you can see by the error shown in Figure 1.

Figure 1. Test Failure: This test failure was caused by inappropriate method chaining.

The Finishing Problem

The problem with using member chaining for the RuleList class is called the finishing problem: When does the call “finish?” When you execute code in the add() method, the rest of the members of the chain haven’t executed yet, which causes an exception. How do you solve this problem?

One solution creates a special “finished” method at the end of the chain. For example, here’s one way to re-write the test:

   list.addDiscount()       .withProfile(Profile.describedAs           .member           .monthlySpending(100)           .frequency(20))       .forMembership(10.0)       .forMonthlySpending(75, 5.0)       .forNumberOfVisits(10, 10.0)       .save();

The save() method marks the end of the chain in this case. Adding a finishing marker works, but it harms the fluency of the interface. When you talk to someone, you don’t say, “Meet me at the place at 5 PM?SAVE!” Instead, chain-finishing should just work.

As an answer to this particular problem, you can use an alternative resolution technique with nested methods. Instead of implementing the add() method as a chained method, you’ll supply context by changing it to a more traditional method call, leaving the other chained methods in place.

   public RuleList add(Discount d)   {      _ruleList.Add(d);      return this;   }

Changing the chained addDiscount() method to use method nesting lets you to write the rule list definition like the test shown below:

   [Test]   public void simple_rule_list()   {   var list = new RuleList();      list.add(Discount         .basedOn(Profile.describedAs            .member            .monthlySpending(100)            .frequency(20))         .forMembership(10.0)         .forMonthlySpending(75, 5.0)         .forNumberOfVisits(10, 10.0));      list.add(Discount         .basedOn(Profile.describedAs            .monthlySpending(50)            .frequency(10))         .forMembership(10.0)         .forMonthlySpending(75, 5.0)         .forNumberOfVisits(10, 10.0));      Assert.AreEqual(2, list.Count);              }

This solves the finishing problem by wrapping the chained methods in a nested method invocation. This is quite common in fluent interfaces. In fact, developers use these rules of thumb when building DSLs:

  • Use method chaining for stateless object construction
  • Use nested methods to control completion criteria

This example of fluent interfaces really just scratches the tip of the iceberg of DSL techniques. As you can see, you can stretch C# in interesting ways to create more readable code. A follow-up article will cover additional DSL techniques; in particular the features added to C# on behalf of LINQ that makes it possible to build rich DSLs.

See also  Comparing different methods of testing your Infrastructure-as-Code

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist