Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Prototyping DSLs in F#: Parsing and Semantic Checking

To build a domain specific language, you need a quick and effective prototyping process that can parse your language syntax and perform semantic checking.


ecently, there has been a revival of interest in domain-specific languages (DSLs). Not only are these languages able to provide a higher level of abstraction over a specific domain (and thus help eliminate errors that stem from low-level constructs in a universal language), but they also provide an effective mechanism for users to fine tune your applications by providing extra configuration, custom business logic, and the like. Altogether, DSLs can make your containing applications much more flexible and versatile.

Roughly speaking, there are two main ways of enlisting domain-specific languages to work for you—you can implement them by interpreting source text written in the source DSL, or by compiling this source text to executable code. There are distinct advantages and disadvantages to both approaches. Many of the implementation phases for an interpreter and compiler are similar, or even identical; for example, syntactic and semantic checking are shared in both. After obtaining a suitable internal representation, a compiler implementation includes phases that progressively break down that representation to low-level instructions, producing assembly, native code, or managed code, depending on the target platform. In contrast, interpreters rarely perform these phases. Instead, you implement the so-called "operational semantics" of the DSL; e.g. you write an evaluator for the internal representation.

Figure 1. Simply in Action: You can build upon Simply to spin off new DSLs and plug them in into your own custom development shell. The application shown here, SimplyLogo, was built from scratch in less than 500 lines of F# code.

In this and a follow-up article you will implement an interpreter for a small DSL, which I like to call "Simply" because of its C-like syntax and simplicity, and instantiate it with built-in functions that resemble those in Logo-like languages. You will accomplish this by building upon the expression parser in an earlier, related article, where you saw that active patterns provide the perfect mechanism (albeit with a small price of a speed sacrifice) to build parsers as a type-safe discipline that closely resembles the formal BNF syntax of your grammars, and implementing semantic checking (in the current) and an evaluator (in the next article) on the enhanced AST representation. Using this language, you can quickly produce graphics that can be defined using simple drawing primitives—and you can use the core evaluator in any context you like. One possible embedding of this DSL, and the target of the subsequent article is shown in Figure 1. For this article, we are mainly concerned with building up the parser for Simply, and checking source programs for semantic correctness.

Author's Note: To follow along with this article series, you need to download the F# May 2009 CTP, or Visual Studio 2010 Beta 1.

A Brief Overview of Simply

Simply, the target DSL of this and a future article, is a small programming language with static scoping, nested variable and function declarations, and a simple looping construct. Below is a short Simply program:

var x = 2
fun x(a b) { a + b + x  }
fun x(y)   { y + x(1 2) }
repeat 100 as i { x(i) }

The preceding program is easy to read; it consists of four commands that define a variable, two functions, and a loop. To parse these commands, you'll need to extend the parser described in the previous article.

Extending Simply with a Looping Construct, Variables, and Functions

The parser implemented in the earlier article parsed arithmetic expressions with function calls and translated them to a custom AST type. For Simply, you will need a slightly more enhanced internal representation to parse expressions that include simple variables, and to go hand-in-hand with that, a handful of language extensions that allow you to define variables and functions, and express repetitions in the form of a simple looping construct (repeat blocks).

Recall that you placed the AST definition in its own module, below you can find the new enhanced version:

namespace IntelliFactory.Simply
module Ast =
    type var = string
    type Expr =
        | Number   of float
        | BinOp    of (float -> float -> float) * Expr * Expr
        | Var      of var
        | FunApply of var * Expr list
        static member Sum (e1, e2)   = BinOp (( + ), e1, e2)
        static member Diff (e1, e2)  = BinOp (( - ), e1, e2)
        static member Prod (e1, e2)  = BinOp (( * ), e1, e2)
        static member Ratio (e1, e2) = BinOp (( / ), e1, e2)

You may have noticed that I changed the code style of this (and the forthcoming) modules slightly to better match the recommended F# light syntax (the default as of the 2009 May CTP) coding guidelines. The idea is to use the least amount of code to express the most, while still enabling quick prototyping and minimal changes when growing your code and the functionality that it delivers.

To the preceding code, you can now add the AST enhancements that point beyond ordinary arithmetic expressions:

    type Command =
        | VarDef   of var * Expr
        | FunDef   of var * var list * Command
        | Repeat   of Expr * var * Command
        | Sequence of Command list
        | Yield    of Expr
    type Prog = Program of Command list

These define:

  • A Command type that can encode variable definitions (initialized with a value)
  • Function definitions (with a function name, formal parameter list, and a Command value representing the body of the function)
  • Repeat blocks (with a control variable, an expression for the cycle length, and a Command value to represent the body of the repeat block)
  • Command sequencing (useful for defining function bodies or repeat blocks that need more than a simple expression)
  • Single expression execution. A list of such expressions makes up a program

With these types, you can now extend the expression parser you built earlier. To make that easier for you I have repeated that code in Listing 1, in a slightly enhanced form:

The only change in this core parser (apart from minor stylistic changes) is in the (|Factor|_|) active pattern: this version adds simple variable references (the third rule) to match the corresponding addition in the expression language of the enhanced AST.

At this point you can really pick up some speed and quickly jot down the rest of the parser for your DSL. First you add the rules for keywords and special characters:

    let (|LBRACE|_|) s = "{"      |> MatchSymbol s
    let (|RBRACE|_|) s = "}"      |> MatchSymbol s
    let (|EQ|_|)     s = "="      |> MatchSymbol s
    let (|VAR|_|)    s = "var"    |> MatchSymbol s
    let (|FUN|_|)    s = "fun"    |> MatchSymbol s
    let (|REPEAT|_|) s = "repeat" |> MatchSymbol s
    let (|AS|_|)     s = "as"     |> MatchSymbol s

The rule for parsing commands is a straightforward translation of the syntax rules into the active patterns you defined earlier:

    let rec (|Command|_|) = function
        | VAR (ID (v, EQ (Expression (expr, rest)))) ->
            (Ast.Command.VarDef (v, expr), rest) |> Some
        | FUN (ID (f, LPAREN (Star (|ID|_|) [] (
            pars, RPAREN (Command (body, rest)))))) ->
            (Ast.Command.FunDef (f, pars, body), rest) |> Some
        | REPEAT (Expression (i, AS (ID (v, Command (body, rest))))) ->
            (Ast.Command.Repeat (i, v, body), rest) |> Some
        | LBRACE (Star (|Command|_|) [] (commands, RBRACE rest)) ->
            (Ast.Command.Sequence commands, rest) |> Some
        | Expression (e, rest) ->
            (Ast.Command.Yield e, rest) |> Some
        | _ ->

For instance, consider the first rule above in the (|Command|_|) active pattern. It literally says:

“match the ‘var’ keyword, then an identifier and bind it to ‘v’, then an equal sign, and finally an expression bound to ‘expr’; and return a Command.VarDef value with the variable and its initial value encountered, along with the rest of the input string as a successful match.”

The other rules are similarly easy to read and construct.

One detail that may need further explanation is how you use the (|Star|_|) active pattern in the function definition or the sequencing rule. Recall that this is a parameterized active pattern that takes two arguments before it is applied on the value you are matching against. The first argument is an active pattern that you would like to “run” zero or many times (hence the name Star, which reflects the star operator in formal languages such as BNF or regular expressions), while the second argument is an initial accumulator used to collect the results of those invocations. Because the initial value of the accumulator typically is an empty list, you may choose to rewrite this active pattern in such a way that providing the initial value is not required; thus giving you a slightly more compact way of specifying “zero or many”-type rules.

Finally, you can write the rule that parses entire programs as:

    let (|Prog|_|) = function
        | Star (|Command|_|) [] (commands, rest) ->
            (Ast.Prog.Program commands, rest) |> Some
        | _ ->

Alternatively, a slightly more verbose style would be (and you may decide to use this style for the preceding active patterns also):

    let (|Prog|_|) s =
        match s with
        | Star (|Command|_|) [] (commands, rest) ->
            (Ast.Prog.Program commands, rest) |> Some
        | _ ->

You can quickly test your parser on Simply programs—just load up the AST and Language modules in F# Interactive by highlighting code and pressing Alt+Enter, and try a small Simply program:

open Language
var x=1
var x=2
var x=3
fun foo(y)
    fun bar(foo)
        var x=x+1
repeat 1000 as x {
|> (|Prog|_|)
|> printf "Result=%A\n";;
     [VarDef ("x",Number 1.0);
      VarDef ("x",Number 2.0);
      VarDef ("x",Number 3.0);
                 [VarDef ("x",BinOp (<fun:Sum@44>,Var "x",Number 1.0));
                  Yield (BinOp (<fun:Sum@44>,Var "foo",Var "x"))]);
            Yield (FunApply ("bar",[BinOp (<fun:Prod@46>,Var "y",Number 2.0)]))]);
      Repeat (Number 1000.0,"x",Sequence [Yield (FunApply ("foo",[Var "x"]))])],

What you'll see as output is the dump of the AST value for the program you piped into the main active pattern that recognizes Simply programs. The next section shows how you can check these AST values for correctness to rule out bugs in your programmers’ code.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.