RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Prototyping DSLs in F#: Parsing and Semantic Checking : Page 2

To build a domain specific language, you need a quick and effective prototyping process that can parse your language syntax and perform semantic checking.


Semantic Checking

Before programs like the one you just saw can be interpreted, you need to check them for various errors that programmers often make. These errors are categorically called semantic errors, because they violate the semantics of the underlying language or its constructs. For example, adding a number to a string is considered illegal in strongly typed languages; even adding a float to an integer is a semantic error unless the plus operator is overridden to take disparate arguments, along with a reasonable strategy to implement the addition operation on them.

Also, you need to consider the possibility that a programmer might redefine variables or functions while still enforcing static scoping. Here's a short Simply program to demonstrate:

var x = 2
fun x(a b) { a + b + x  }
fun x(y)   { y + x(1 2) }

The preceding code defines x multiple times. First, it's assigned the value 2. Then it is redefined as a function taking two arguments, adding them together and incrementing the result with the previous value of x. The third line again redefines x as a new function taking a single argument and incrementing it with the result of the previous definition, called with arguments 1 and 2. The point here is that x has different meanings in each line, and it is non-trivial to evaluate such code unless you verify that each reference to x is a valid one, signaling an error otherwise.

Normally, such semantic checking includes type checking; this is the phase in an interpreter or compiler when values are checked against type expectations of consuming functions or signatures. For example, when you write 1 @ [ 2 ] in F#, the compiler verifies that the ampersand (@) function (whose signature is 'A -> 'A list -> 'A list) can be called with parameters of type int and int list. Because these fit the more general polymorphic type signature, the compiler happily accepts this expression. However, when you try to add a string and an integer together, the compiler will flag it as an error because it cannot find an instance of the plus operator that adds values of disparate types.

In Simply, however, all values are floating-point values. This decision eliminates the need for extensive type checking. (It also means that you will not be able to partially apply functions, because for these the return value would be a function type—which does not fit into Simply's float-only type system).

Overall, you need to accomplish two things when checking Simply programs. First, you need to verify that variable and function references are valid, e.g. that a variable or function exists in scope at any usage site. Second, you need to check that all function calls are complete—that they supply all the required arguments, in other words functions are called with the arity with which they were defined.

You can perform both tasks in a single sweep as you traverse an AST value. To correctly identify each usage site of a definition (either a variable or a function), rename each definition and usage site in such a way that the usage sites always refer to the latest definition in scope (remember that Simply is a statically scoped language just like others you may be familiar with, including VB, C# and F#). To do this, you can use an environment that maps identifiers (recall from the active pattern for identifiers that these can only contain alpha characters) to index*arity pairs, where index is the latest version of the identifier in scope, and arity tells you whether it was a function or a variable (and supplies the actual arity in case of a function, and Arity.NoArity in case of a variable).

Start a new Standardize module to contain the environment-related functionality as a nested module that supplies various helper functions (see Listing 2), each of which takes an environment to operate on. The helper function EnvAdd adds a new identifier given its name and arity (remember that variables should have Arity.NoArity) and returns a new name that includes the version (for example, x12). The helper function EnvRefresh returns the latest binding (I simply number these bindings from 0—see EnvAdd). Finally, EnvArity returns the arity of a given identifier.

Note that the “next” indices for any given identifier come from a private counters map to make sure that no two bindings have the same name, and across various standardization sessions you may want to reset these counters by calling the ResetEnv function.

The actual walking of the AST takes place in the ConvertProgram function:

    let private thread f acc (lst: 'a list) =
        |> List.fold (fun (acc1, acc2) el ->
            let _acc1, _acc2 = f acc1 el
            _acc1, _acc2 :: acc2) (acc, [])
        |> fun (acc1, acc2) ->
            acc1, List.rev acc2
    let rec ConvertProgram env (Ast.Program commands) =
        |> thread CC env
        |> fun (env, commands) ->
            env, commands |> Ast.Program
    and private CC env command =
        match command with
        | Ast.Command.VarDef (v, expr) ->
            let _env, _v = Env.EnvAdd env (v, Arity.NoArity)
            _env, Ast.Command.VarDef (_v, CE env expr)
        | Ast.Command.FunDef (f, pars, body) ->
            let _env, _f = Env.EnvAdd env (
               f, pars |> List.length |> Arity.Arity)
            |> List.map (fun par -> par, Arity.NoArity)
            |> thread Env.EnvAdd env
            |> fun (envPars, parsP) ->
                _env, Ast.Command.FunDef (
                   _f, parsP, CC envPars body |> snd)
        | Ast.Command.Repeat (expr, v, body) ->
            let _env, _v = Env.EnvAdd env (v, Arity.NoArity)
            env, Ast.Command.Repeat (CE env expr, _v, CC _env body |> snd)
        | Ast.Command.Sequence commands ->
            |> thread CC env
            |> fun (_, commands) ->
                env, Ast.Command.Sequence commands
        | Ast.Yield e ->
            env, Yield (CE env e)

ConvertProgram simply threads the environment through invocations of CC using the private function thread—and converts the commands that make up the input program by introducing new bindings where necessary, refreshing usage sites from its operating environment. For example, when defining a new variable, you create a new environment and a new binding (_env and _v, respectively), and return that environment and the refreshed VarDef instance.

Ultimately, you end up traversing each expression by calling CE, which converts an expression:

    and private CE env expr =
        match expr with
        | Ast.Expr.Number n ->
            Ast.Expr.Number n
        | Ast.Expr.Var v ->
            |> Env.EnvArity env
            |> function
                | NoArity ->
                    |> Env.EnvRefresh env
                    |> Ast.Expr.Var
                | Arity _ ->
                    v |> sprintf 
                    "'%s' is a function and needs arguments" |> failwith
        | Ast.Expr.BinOp (f, e1, e2) ->
            Ast.Expr.BinOp (f, CE env e1, CE env e2)
        | Ast.Expr.FunApply (f, args) ->
            Env.EnvRefresh env f
            |> fun _f ->
                |> Env.EnvArity env
                |> function
                    | Arity ar when ar = List.length args ->
                        Ast.Expr.FunApply (_f, List.map (CE env) args)
                    | _ ->
                        f |> sprintf 
                         "Arity mismatch on calling '%s'" |> failwith

Here, you are propagating CE to sub-expressions (into BinOp shapes and function arguments of FunApply), checking that variables are not used as functions (Var), verifying that functions are called with the correct arity (FunApply), and simply letting numbers through.

In F# Interactive, you can easily test that renaming and semantic checking work as expected:

var x = 2
fun x(a b) { a + b + x  }
fun x(y)   { y + x(1 2) }
|> fun s ->
    match s with
    | Language.Prog (prog, Language.Eof) ->
        |> Standardize.ConvertProgram Map.Empty
        |> fun (_, _prog) ->
        _prog |> printf "Result=%A\n"
    | _ ->
        printf "Syntax error\n";;

The preceding code produces the underlying AST representation for:

var x0 = 2
fun x1(a0 b0) { a0 + b0 + x0  }
fun x2(y0)   { y0 + x1(1 2) }

Of course, this result would not parse in Simply, because identifiers cannot have numeric characters—but internally it makes a valid representation and shows that your renaming strategy is applied correctly.

Feeding semantically incorrect programs into the standardizer will result in an error. As a test, if you change the arity of the function call inside the body of x(y) in the example above, or refer to identifiers not in scope, you will receive an exception about arity mismatch, or the function or variable not being defined, respectively.

You've seen how to build a parser for a full-blown DSL by extending the active-pattern based parser from this earlier article, giving it some new rules for parsing variable and function definitions, a simple looping construct, and sequencing. In addition, you've seen how to implement a semantic checking phase for this language by renaming definition and usage sites of identifiers in an unambiguous way (via the so called standardization phase), making sure that usage sites are valid, and that functions are called with the correct arity. Using the techniques described in this article you can quickly prototype your own DSLs, and implement your own static analysis on your source programs to enable programmers to find semantic errors in their code.

The next article shows how to take the Simply architecture introduced here, instantiate it with built-in primitives for a Logo-like language, and erect a development shell around it to make it easy to write and test turtle graphics code.

Related Resources

Adam Granicz is the CEO of IntelliFactory, the primary place for F# development, training and consulting services, and the coauthor of Expert F#, the most comprehensive guide on F# coauthored with Don Syme, the designer of the language. He has done research in functional programming languages and holds a Masters degree from the California Institute of Technology.
Email AuthorEmail Author
Close Icon
Thanks for your registration, follow us on our social networks to keep up-to-date