Login | Register   
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Parsing with Active Patterns in F# : Page 4

Discover how to use F#'s active patterns to build prototype parsers that you can test interactively.


advertisement

Parsing with Active Patterns: Building Up Rules

Now you have all the primitives to start building your parsers. Consider a simple language consisting of arithmetic expressions such as those you saw in the first part of this article. In that language (for which you only defined the abstract syntax earlier) you had numbers and the four basic arithmetic operators. This time you want to build a parser for this language, but also extend it to include floats (you just make all numbers floats) and function calls. A sample input in this language might be the string 1+cos(2+3)/sin(4*5).

This expression language calls for a slightly updated AST, so you need to change the earlier definition a bit (note that you are defining this as part of a new Ast module):



module Ast = type var = string type Expr = | Number of float | BinOp of (float -> float -> float) * Expr * Expr | FunApply of var * Expr list with static member Sum (e1, e2) = BinOp (( + ), e1, e2) static member Diff (e1, e2) = BinOp (( - ), e1, e2) static member Prod (e1, e2) = BinOp (( * ), e1, e2) static member Ratio (e1, e2) = BinOp (( / ), e1, e2)

The parser (continuing the Language module you defined earlier) needs active patterns to recognize numbers (floats), identifiers (function names that you are calling), and various tokens (parentheses and operators):

let (|NUMBER|_|) s = "[0-9]+\.?[0-9]*" |> MatchToken s (fun (n, rest) -> (n |> Double.Parse, rest) |> Some) let (|ID|_|) s = "[a-zA-Z]+" |> MatchToken s (fun res -> res |> Some) let (|PLUS|_|) s = "\+" |> MatchSymbol s let (|MINUS|_|) s = "-" |> MatchSymbol s let (|MUL|_|) s = "\*" |> MatchSymbol s let (|DIV|_|) s = "/" |> MatchSymbol s let (|LPAREN|_|) s = "\(" |> MatchSymbol s let (|RPAREN|_|) s = "\)" |> MatchSymbol s

Using these you can express the essential part of your grammar: expressions. To get around operator precedence you can simply introduce a hierarchy of active patterns that parse parts of an expression in decreasing precedence as is commonly done in recursive descent (LL) parsers.

let rec (|Factor|_|) = function | NUMBER (n, rest) -> (Ast.Expr.Number n, rest) |> Some | ID (f, LPAREN (Star (|Expression|_|) [] (args, RPAREN rest))) -> (Ast.Expr.FunApply (f, args), rest) |> Some | _ -> None and (|Term|_|) = function | Factor (e1, MUL (Term (e2, rest))) -> (Ast.Expr.Prod (e1, e2), rest) |> Some | Factor (e1, DIV (Term (e2, rest))) -> (Ast.Expr.Ratio (e1, e2), rest) |> Some | Factor (e, rest) -> (e, rest) |> Some | _ -> None and (|Sum|_|) = function | Term (e1, PLUS (Sum (e2, rest))) -> (Ast.Expr.Sum (e1, e2), rest) |> Some | Term (e1, MINUS (Sum (e2, rest))) -> (Ast.Expr.Diff (e1, e2), rest) |> Some | Term (e, rest) -> (e, rest) |> Some | _ -> None and (|Expression|_|) = (|Sum|_|)

Note that in this language the arguments passed to a function call are not separated, so you can pass arbitrary number of arguments to functions by separating them simply with whitespace. If you want to add your own separator symbol, you can build another rule that parses expressions followed by that separator symbol, and have the Star active pattern refer to that rule.

Because the rule that you first match successfully preempts all others, you need to be careful to list those rules that are recursive first—which inherently implies that you cannot start a match case with the same active pattern that you are defining; otherwise, you wind up in an infinite recursion. To get around that issue, you can make all binary operators associate to the right, so 1+2+3 will actually reduce to 1+(2+3).

Finally, a useful active pattern is Eof, which recognizes the end of the input string. However, rather than simply checking for an empty remaining input string, you should also check whether the only remaining input is whitespace (recall that whitespace can contain comments also).

let (|Eof|_|) s = if s |> String.IsNullOrEmpty then () |> Some else match s with | WhiteSpace (_, rest) when rest |> String.IsNullOrEmpty -> () |> Some | _ -> None

After you load the Ast and the Language module into F# Interactive, you can quickly test your parser:

> let _ = match "1+cos(2+3)/sin(4*5)" with | Language.Expression (e, Language.Eof) -> e |> printf "Match – AST: %A\n" | _ -> printf "No match\n";; > Match – AST: BinOp (<fun:Sum@44>,Number 1.0, BinOp (<fun:Ratio@47>, FunApply ("cos",[BinOp (<fun:Sum@44>,Number 2.0,Number 3.0)]), FunApply ("sin",[BinOp (<fun:Prod@46>,Number 4.0,Number 5.0)])))

F# lets you test your active patterns interactively, which is tremendously useful when developing your parser rules:

> Language.(|Expression|_|) "1+2*3/4";; val it : (Ast.Expr * string) option = Some (BinOp (<fun:Sum@44>,Number 1.0, BinOp (<fun:Prod@46>,Number 2.0, BinOp (<fun:Ratio@47>,Number 3.0,Number 4.0))), "")

You can now patch up the evaluator you saw in the first part of this article to evaluate this new expression type by adding the evaluation of function calls (here you add only the sine and cosine functions, both of which work on a single argument):

module Evaluator = open System let rec Eval = function | Ast.Number f -> f | Ast.BinOp (f, e1, e2) -> f (Eval e1) (Eval e2) | Ast.FunApply (f, [e]) when f.ToLower() = "sin" -> (Eval e) |> Math.Sin | Ast.FunApply (f, [e]) when f.ToLower() = "cos" -> (Eval e) |> Math.Cos | e -> e |> sprintf "Unknown function or arity: %A" |> failwith

Finally, you can test this evaluator in F# Interactive:

> let _ = match "1+cos(2+3)/sin(4*5)" with | Language.Expression (e, Language.Eof) -> e |> Evaluator.Eval |> printf "Result=%f\n" | _ -> printf "No match\n";; > Result=1.310711

You've seen how to define active patterns that convert and partition values, via an example that used partial active patterns to implement a parser for arithmetic expressions with function calls. Effectively, you've seen how to translate the boring and error-prone task of implementing grammars with parser generators into a typed discipline that closely matches your parsing rules expressed in BNF syntax. Using these techniques you are better equipped to quickly prototype and implement your own parsers, define your own domain-specific languages, and work with your own concrete syntax for your data.

For further reading, try these links:



Adam Granicz is the CEO of IntelliFactory, the primary place for F# development, training and consulting services, and the coauthor of Expert F#, the most comprehensive guide on F# coauthored with Don Syme, the designer of the language. He has done research in functional programming languages and holds a Masters degree from the California Institute of Technology.
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap