devxlogo

Prototyping DSLs in F#: Parsing and Semantic Checking

Prototyping DSLs in F#: Parsing and Semantic Checking

ecently, there has been a revival of interest in domain-specific languages (DSLs). Not only are these languages able to provide a higher level of abstraction over a specific domain (and thus help eliminate errors that stem from low-level constructs in a universal language), but they also provide an effective mechanism for users to fine tune your applications by providing extra configuration, custom business logic, and the like. Altogether, DSLs can make your containing applications much more flexible and versatile.

Roughly speaking, there are two main ways of enlisting domain-specific languages to work for you?you can implement them by interpreting source text written in the source DSL, or by compiling this source text to executable code. There are distinct advantages and disadvantages to both approaches. Many of the implementation phases for an interpreter and compiler are similar, or even identical; for example, syntactic and semantic checking are shared in both. After obtaining a suitable internal representation, a compiler implementation includes phases that progressively break down that representation to low-level instructions, producing assembly, native code, or managed code, depending on the target platform. In contrast, interpreters rarely perform these phases. Instead, you implement the so-called “operational semantics” of the DSL; e.g. you write an evaluator for the internal representation.

?
Figure 1. Simply in Action: You can build upon Simply to spin off new DSLs and plug them in into your own custom development shell. The application shown here, SimplyLogo, was built from scratch in less than 500 lines of F# code.

In this and a follow-up article you will implement an interpreter for a small DSL, which I like to call “Simply” because of its C-like syntax and simplicity, and instantiate it with built-in functions that resemble those in Logo-like languages. You will accomplish this by building upon the expression parser in an earlier, related article, where you saw that active patterns provide the perfect mechanism (albeit with a small price of a speed sacrifice) to build parsers as a type-safe discipline that closely resembles the formal BNF syntax of your grammars, and implementing semantic checking (in the current) and an evaluator (in the next article) on the enhanced AST representation. Using this language, you can quickly produce graphics that can be defined using simple drawing primitives?and you can use the core evaluator in any context you like. One possible embedding of this DSL, and the target of the subsequent article is shown in Figure 1. For this article, we are mainly concerned with building up the parser for Simply, and checking source programs for semantic correctness.

Author’s Note: To follow along with this article series, you need to download the F# May 2009 CTP, or Visual Studio 2010 Beta 1.

A Brief Overview of Simply

Simply, the target DSL of this and a future article, is a small programming language with static scoping, nested variable and function declarations, and a simple looping construct. Below is a short Simply program:

var x = 2fun x(a b) { a + b + x  }fun x(y)   { y + x(1 2) }repeat 100 as i { x(i) }

The preceding program is easy to read; it consists of four commands that define a variable, two functions, and a loop. To parse these commands, you’ll need to extend the parser described in the previous article.

Extending Simply with a Looping Construct, Variables, and Functions

The parser implemented in the earlier article parsed arithmetic expressions with function calls and translated them to a custom AST type. For Simply, you will need a slightly more enhanced internal representation to parse expressions that include simple variables, and to go hand-in-hand with that, a handful of language extensions that allow you to define variables and functions, and express repetitions in the form of a simple looping construct (repeat blocks).

Recall that you placed the AST definition in its own module, below you can find the new enhanced version:

namespace IntelliFactory.Simplymodule Ast =    type var = string    type Expr =        | Number   of float        | BinOp    of (float -> float -> float) * Expr * Expr        | Var      of var        | FunApply of var * Expr list                static member Sum (e1, e2)   = BinOp (( + ), e1, e2)        static member Diff (e1, e2)  = BinOp (( - ), e1, e2)        static member Prod (e1, e2)  = BinOp (( * ), e1, e2)        static member Ratio (e1, e2) = BinOp (( / ), e1, e2)

You may have noticed that I changed the code style of this (and the forthcoming) modules slightly to better match the recommended F# light syntax (the default as of the 2009 May CTP) coding guidelines. The idea is to use the least amount of code to express the most, while still enabling quick prototyping and minimal changes when growing your code and the functionality that it delivers.

To the preceding code, you can now add the AST enhancements that point beyond ordinary arithmetic expressions:

    type Command =        | VarDef   of var * Expr        | FunDef   of var * var list * Command        | Repeat   of Expr * var * Command        | Sequence of Command list        | Yield    of Expr    type Prog = Program of Command list

These define:

  • A Command type that can encode variable definitions (initialized with a value)
  • Function definitions (with a function name, formal parameter list, and a Command value representing the body of the function)
  • Repeat blocks (with a control variable, an expression for the cycle length, and a Command value to represent the body of the repeat block)
  • Command sequencing (useful for defining function bodies or repeat blocks that need more than a simple expression)
  • Single expression execution. A list of such expressions makes up a program

With these types, you can now extend the expression parser you built earlier. To make that easier for you I have repeated that code in Listing 1, in a slightly enhanced form:

The only change in this core parser (apart from minor stylistic changes) is in the (|Factor|_|) active pattern: this version adds simple variable references (the third rule) to match the corresponding addition in the expression language of the enhanced AST.

At this point you can really pick up some speed and quickly jot down the rest of the parser for your DSL. First you add the rules for keywords and special characters:

    let (|LBRACE|_|) s = "{"      |> MatchSymbol s    let (|RBRACE|_|) s = "}"      |> MatchSymbol s    let (|EQ|_|)     s = "="      |> MatchSymbol s    let (|VAR|_|)    s = "var"    |> MatchSymbol s    let (|FUN|_|)    s = "fun"    |> MatchSymbol s    let (|REPEAT|_|) s = "repeat" |> MatchSymbol s    let (|AS|_|)     s = "as"     |> MatchSymbol s

The rule for parsing commands is a straightforward translation of the syntax rules into the active patterns you defined earlier:

    let rec (|Command|_|) = function        | VAR (ID (v, EQ (Expression (expr, rest)))) ->            (Ast.Command.VarDef (v, expr), rest) |> Some        | FUN (ID (f, LPAREN (Star (|ID|_|) [] (            pars, RPAREN (Command (body, rest)))))) ->            (Ast.Command.FunDef (f, pars, body), rest) |> Some        | REPEAT (Expression (i, AS (ID (v, Command (body, rest))))) ->            (Ast.Command.Repeat (i, v, body), rest) |> Some        | LBRACE (Star (|Command|_|) [] (commands, RBRACE rest)) ->            (Ast.Command.Sequence commands, rest) |> Some        | Expression (e, rest) ->            (Ast.Command.Yield e, rest) |> Some        | _ ->            None

For instance, consider the first rule above in the (|Command|_|) active pattern. It literally says:

?match the ?var? keyword, then an identifier and bind it to ?v?, then an equal sign, and finally an expression bound to ?expr?; and return a Command.VarDef value with the variable and its initial value encountered, along with the rest of the input string as a successful match.?

The other rules are similarly easy to read and construct.

One detail that may need further explanation is how you use the (|Star|_|) active pattern in the function definition or the sequencing rule. Recall that this is a parameterized active pattern that takes two arguments before it is applied on the value you are matching against. The first argument is an active pattern that you would like to ?run? zero or many times (hence the name Star, which reflects the star operator in formal languages such as BNF or regular expressions), while the second argument is an initial accumulator used to collect the results of those invocations. Because the initial value of the accumulator typically is an empty list, you may choose to rewrite this active pattern in such a way that providing the initial value is not required; thus giving you a slightly more compact way of specifying ?zero or many?-type rules.

Finally, you can write the rule that parses entire programs as:

    let (|Prog|_|) = function        | Star (|Command|_|) [] (commands, rest) ->            (Ast.Prog.Program commands, rest) |> Some        | _ ->            None

Alternatively, a slightly more verbose style would be (and you may decide to use this style for the preceding active patterns also):

    let (|Prog|_|) s =        match s with        | Star (|Command|_|) [] (commands, rest) ->            (Ast.Prog.Program commands, rest) |> Some        | _ ->            None

You can quickly test your parser on Simply programs?just load up the AST and Language modules in F# Interactive by highlighting code and pressing Alt+Enter, and try a small Simply program:

open Language    "var x=1var x=2var x=3fun foo(y){    fun bar(foo)    {        var x=x+1        foo+x    }    bar(y*2)}repeat 1000 as x {    foo(x)}"|> (|Prog|_|)|> printf "Result=%A
";;> Result=Some  (Program     [VarDef ("x",Number 1.0);      VarDef ("x",Number 2.0);      VarDef ("x",Number 3.0);      FunDef        ("foo",["y"],         Sequence           [FunDef              ("bar",["foo"],               Sequence                 [VarDef ("x",BinOp (,Var "x",Number 1.0));                  Yield (BinOp (,Var "foo",Var "x"))]);            Yield (FunApply ("bar",[BinOp (,Var "y",Number 2.0)]))]);      Repeat (Number 1000.0,"x",Sequence [Yield (FunApply ("foo",[Var "x"]))])],   "")>

What you’ll see as output is the dump of the AST value for the program you piped into the main active pattern that recognizes Simply programs. The next section shows how you can check these AST values for correctness to rule out bugs in your programmers? code.

Semantic Checking

Before programs like the one you just saw can be interpreted, you need to check them for various errors that programmers often make. These errors are categorically called semantic errors, because they violate the semantics of the underlying language or its constructs. For example, adding a number to a string is considered illegal in strongly typed languages; even adding a float to an integer is a semantic error unless the plus operator is overridden to take disparate arguments, along with a reasonable strategy to implement the addition operation on them.

Also, you need to consider the possibility that a programmer might redefine variables or functions while still enforcing static scoping. Here’s a short Simply program to demonstrate:

var x = 2fun x(a b) { a + b + x  }fun x(y)   { y + x(1 2) }x(3)

The preceding code defines x multiple times. First, it’s assigned the value 2. Then it is redefined as a function taking two arguments, adding them together and incrementing the result with the previous value of x. The third line again redefines x as a new function taking a single argument and incrementing it with the result of the previous definition, called with arguments 1 and 2. The point here is that x has different meanings in each line, and it is non-trivial to evaluate such code unless you verify that each reference to x is a valid one, signaling an error otherwise.

Normally, such semantic checking includes type checking; this is the phase in an interpreter or compiler when values are checked against type expectations of consuming functions or signatures. For example, when you write 1 @ [ 2 ] in F#, the compiler verifies that the ampersand (@) function (whose signature is 'A -> 'A list -> 'A list) can be called with parameters of type int and int list. Because these fit the more general polymorphic type signature, the compiler happily accepts this expression. However, when you try to add a string and an integer together, the compiler will flag it as an error because it cannot find an instance of the plus operator that adds values of disparate types.

In Simply, however, all values are floating-point values. This decision eliminates the need for extensive type checking. (It also means that you will not be able to partially apply functions, because for these the return value would be a function type?which does not fit into Simply’s float-only type system).

Overall, you need to accomplish two things when checking Simply programs. First, you need to verify that variable and function references are valid, e.g. that a variable or function exists in scope at any usage site. Second, you need to check that all function calls are complete?that they supply all the required arguments, in other words functions are called with the arity with which they were defined.

You can perform both tasks in a single sweep as you traverse an AST value. To correctly identify each usage site of a definition (either a variable or a function), rename each definition and usage site in such a way that the usage sites always refer to the latest definition in scope (remember that Simply is a statically scoped language just like others you may be familiar with, including VB, C# and F#). To do this, you can use an environment that maps identifiers (recall from the active pattern for identifiers that these can only contain alpha characters) to index*arity pairs, where index is the latest version of the identifier in scope, and arity tells you whether it was a function or a variable (and supplies the actual arity in case of a function, and Arity.NoArity in case of a variable).

Start a new Standardize module to contain the environment-related functionality as a nested module that supplies various helper functions (see Listing 2), each of which takes an environment to operate on. The helper function EnvAdd adds a new identifier given its name and arity (remember that variables should have Arity.NoArity) and returns a new name that includes the version (for example, x12). The helper function EnvRefresh returns the latest binding (I simply number these bindings from 0?see EnvAdd). Finally, EnvArity returns the arity of a given identifier.

Note that the ?next? indices for any given identifier come from a private counters map to make sure that no two bindings have the same name, and across various standardization sessions you may want to reset these counters by calling the ResetEnv function.

The actual walking of the AST takes place in the ConvertProgram function:

    let private thread f acc (lst: 'a list) =        lst        |> List.fold (fun (acc1, acc2) el ->            let _acc1, _acc2 = f acc1 el            _acc1, _acc2 :: acc2) (acc, [])        |> fun (acc1, acc2) ->            acc1, List.rev acc2    let rec ConvertProgram env (Ast.Program commands) =        commands        |> thread CC env        |> fun (env, commands) ->            env, commands |> Ast.Program    and private CC env command =        match command with        | Ast.Command.VarDef (v, expr) ->            let _env, _v = Env.EnvAdd env (v, Arity.NoArity)            _env, Ast.Command.VarDef (_v, CE env expr)        | Ast.Command.FunDef (f, pars, body) ->            let _env, _f = Env.EnvAdd env (               f, pars |> List.length |> Arity.Arity)            pars            |> List.map (fun par -> par, Arity.NoArity)            |> thread Env.EnvAdd env            |> fun (envPars, parsP) ->                _env, Ast.Command.FunDef (                   _f, parsP, CC envPars body |> snd)        | Ast.Command.Repeat (expr, v, body) ->            let _env, _v = Env.EnvAdd env (v, Arity.NoArity)            env, Ast.Command.Repeat (CE env expr, _v, CC _env body |> snd)        | Ast.Command.Sequence commands ->            commands            |> thread CC env            |> fun (_, commands) ->                env, Ast.Command.Sequence commands        | Ast.Yield e ->            env, Yield (CE env e)

ConvertProgram simply threads the environment through invocations of CC using the private function thread?and converts the commands that make up the input program by introducing new bindings where necessary, refreshing usage sites from its operating environment. For example, when defining a new variable, you create a new environment and a new binding (_env and _v, respectively), and return that environment and the refreshed VarDef instance.

Ultimately, you end up traversing each expression by calling CE, which converts an expression:

    and private CE env expr =        match expr with        | Ast.Expr.Number n ->            Ast.Expr.Number n        | Ast.Expr.Var v ->            v            |> Env.EnvArity env            |> function                | NoArity ->                    v                    |> Env.EnvRefresh env                    |> Ast.Expr.Var                | Arity _ ->                    v |> sprintf                     "'%s' is a function and needs arguments" |> failwith        | Ast.Expr.BinOp (f, e1, e2) ->            Ast.Expr.BinOp (f, CE env e1, CE env e2)        | Ast.Expr.FunApply (f, args) ->            Env.EnvRefresh env f            |> fun _f ->                f                |> Env.EnvArity env                |> function                    | Arity ar when ar = List.length args ->                        Ast.Expr.FunApply (_f, List.map (CE env) args)                    | _ ->                        f |> sprintf                          "Arity mismatch on calling '%s'" |> failwith

Here, you are propagating CE to sub-expressions (into BinOp shapes and function arguments of FunApply), checking that variables are not used as functions (Var), verifying that functions are called with the correct arity (FunApply), and simply letting numbers through.

In F# Interactive, you can easily test that renaming and semantic checking work as expected:

"var x = 2fun x(a b) { a + b + x  }fun x(y)   { y + x(1 2) }x(3)"|> fun s ->    match s with    | Language.Prog (prog, Language.Eof) ->        prog        |> Standardize.ConvertProgram Map.Empty        |> fun (_, _prog) ->        _prog |> printf "Result=%A
"    | _ ->        printf "Syntax error
";;

The preceding code produces the underlying AST representation for:

var x0 = 2fun x1(a0 b0) { a0 + b0 + x0  }fun x2(y0)   { y0 + x1(1 2) }x2(3)

Of course, this result would not parse in Simply, because identifiers cannot have numeric characters?but internally it makes a valid representation and shows that your renaming strategy is applied correctly.

Feeding semantically incorrect programs into the standardizer will result in an error. As a test, if you change the arity of the function call inside the body of x(y) in the example above, or refer to identifiers not in scope, you will receive an exception about arity mismatch, or the function or variable not being defined, respectively.

You’ve seen how to build a parser for a full-blown DSL by extending the active-pattern based parser from this earlier article, giving it some new rules for parsing variable and function definitions, a simple looping construct, and sequencing. In addition, you’ve seen how to implement a semantic checking phase for this language by renaming definition and usage sites of identifiers in an unambiguous way (via the so called standardization phase), making sure that usage sites are valid, and that functions are called with the correct arity. Using the techniques described in this article you can quickly prototype your own DSLs, and implement your own static analysis on your source programs to enable programmers to find semantic errors in their code.

The next article shows how to take the Simply architecture introduced here, instantiate it with built-in primitives for a Logo-like language, and erect a development shell around it to make it easy to write and test turtle graphics code.

Related Resources

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist