Parsing with Active Patterns: Building Up Rules
Now you have all the primitives to start building your parsers. Consider a simple language consisting of arithmetic expressions such as those you saw in the first part of this article. In that language (for which you only defined the abstract syntax earlier) you had numbers and the four basic arithmetic operators. This time you want to build a parser for this language, but also extend it to include floats (you just make all numbers floats) and function calls. A sample input in this language might be the string 1+cos(2+3)/sin(4*5).
This expression language calls for a slightly updated AST, so you need to change the earlier definition a bit (note that you are defining this as part of a new Ast module):
module Ast =
type var = string
type Expr =
| Number of float
| BinOp of (float -> float -> float) * Expr * Expr
| FunApply of var * Expr list
with
static member Sum (e1, e2) = BinOp (( + ), e1, e2)
static member Diff (e1, e2) = BinOp (( - ), e1, e2)
static member Prod (e1, e2) = BinOp (( * ), e1, e2)
static member Ratio (e1, e2) = BinOp (( / ), e1, e2)
The parser (continuing the Language module you defined earlier) needs active patterns to recognize numbers (floats), identifiers (function names that you are calling), and various tokens (parentheses and operators):
let (|NUMBER|_|) s =
"[0-9]+\.?[0-9]*" |> MatchToken s
(fun (n, rest) -> (n |> Double.Parse, rest) |> Some)
let (|ID|_|) s =
"[a-zA-Z]+" |> MatchToken s (fun res -> res |> Some)
let (|PLUS|_|) s = "\+" |> MatchSymbol s
let (|MINUS|_|) s = "-" |> MatchSymbol s
let (|MUL|_|) s = "\*" |> MatchSymbol s
let (|DIV|_|) s = "/" |> MatchSymbol s
let (|LPAREN|_|) s = "\(" |> MatchSymbol s
let (|RPAREN|_|) s = "\)" |> MatchSymbol s
Using these you can express the essential part of your grammar: expressions. To get around operator precedence you can simply introduce a hierarchy of active patterns that parse parts of an expression in decreasing precedence as is commonly done in recursive descent (LL) parsers.
let rec (|Factor|_|) = function
| NUMBER (n, rest) ->
(Ast.Expr.Number n, rest) |> Some
| ID (f, LPAREN (Star (|Expression|_|) [] (args, RPAREN rest))) ->
(Ast.Expr.FunApply (f, args), rest) |> Some
| _ ->
None
and (|Term|_|) = function
| Factor (e1, MUL (Term (e2, rest))) ->
(Ast.Expr.Prod (e1, e2), rest) |> Some
| Factor (e1, DIV (Term (e2, rest))) ->
(Ast.Expr.Ratio (e1, e2), rest) |> Some
| Factor (e, rest) ->
(e, rest) |> Some
| _ ->
None
and (|Sum|_|) = function
| Term (e1, PLUS (Sum (e2, rest))) ->
(Ast.Expr.Sum (e1, e2), rest) |> Some
| Term (e1, MINUS (Sum (e2, rest))) ->
(Ast.Expr.Diff (e1, e2), rest) |> Some
| Term (e, rest) ->
(e, rest) |> Some
| _ ->
None
and (|Expression|_|) = (|Sum|_|)
Note that in this language the arguments passed to a function call are not separated, so you can pass arbitrary number of arguments to functions by separating them simply with whitespace. If you want to add your own separator symbol, you can build another rule that parses expressions followed by that separator symbol, and have the Star active pattern refer to that rule.
Because the rule that you first match successfully preempts all others, you need to be careful to list those rules that are recursive first—which inherently implies that you cannot start a match case with the same active pattern that you are defining; otherwise, you wind up in an infinite recursion. To get around that issue, you can make all binary operators associate to the right, so 1+2+3 will actually reduce to 1+(2+3).
Finally, a useful active pattern is Eof, which recognizes the end of the input string. However, rather than simply checking for an empty remaining input string, you should also check whether the only remaining input is whitespace (recall that whitespace can contain comments also).
let (|Eof|_|) s =
if s |> String.IsNullOrEmpty then
() |> Some
else
match s with
| WhiteSpace (_, rest) when rest |> String.IsNullOrEmpty ->
() |> Some
| _ ->
None
After you load the Ast and the Language module into F# Interactive, you can quickly test your parser:
> let _ =
match "1+cos(2+3)/sin(4*5)" with
| Language.Expression (e, Language.Eof) ->
e |> printf "Match – AST: %A\n"
| _ ->
printf "No match\n";;
>
Match – AST: BinOp
(<fun:Sum@44>,Number 1.0,
BinOp
(<fun:Ratio@47>,
FunApply ("cos",[BinOp (<fun:Sum@44>,Number 2.0,Number 3.0)]),
FunApply ("sin",[BinOp (<fun:Prod@46>,Number 4.0,Number 5.0)])))
F# lets you test your active patterns interactively, which is tremendously useful when developing your parser rules:
> Language.(|Expression|_|) "1+2*3/4";;
val it : (Ast.Expr * string) option =
Some
(BinOp
(<fun:Sum@44>,Number 1.0,
BinOp
(<fun:Prod@46>,Number 2.0,
BinOp (<fun:Ratio@47>,Number 3.0,Number 4.0))), "")
You can now patch up the evaluator you saw in the first part of this article to evaluate this new expression type by adding the evaluation of function calls (here you add only the sine and cosine functions, both of which work on a single argument):
module Evaluator =
open System
let rec Eval = function
| Ast.Number f -> f
| Ast.BinOp (f, e1, e2) -> f (Eval e1) (Eval e2)
| Ast.FunApply (f, [e]) when f.ToLower() = "sin" ->
(Eval e) |> Math.Sin
| Ast.FunApply (f, [e]) when f.ToLower() = "cos" ->
(Eval e) |> Math.Cos
| e ->
e |> sprintf "Unknown function or arity: %A" |> failwith
Finally, you can test this evaluator in F# Interactive:
> let _ =
match "1+cos(2+3)/sin(4*5)" with
| Language.Expression (e, Language.Eof) ->
e |> Evaluator.Eval |> printf "Result=%f\n"
| _ ->
printf "No match\n";;
>
Result=1.310711
You've seen how to define active patterns that convert and partition values, via an example that used partial active patterns to implement a parser for arithmetic expressions with function calls. Effectively, you've seen how to translate the boring and error-prone task of implementing grammars with parser generators into a typed discipline that closely matches your parsing rules expressed in BNF syntax. Using these techniques you are better equipped to quickly prototype and implement your own parsers, define your own domain-specific languages, and work with your own concrete syntax for your data.
For further reading, try these links: