Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Create Domain-Specific Languages with ANTLR : Page 4

The latest version of ANTLR provides the tools you need to build a parser for special-purpose languages.

Building an Intermediate AST
Figure 2. Prettified SPARQL: The SPARQL prettifier example color codes key words and indents blocks.
Now that you have a basic SPARQL recognizer, the next step is to transform SPARQL input to some other form of output. This example transforms a SPARQL query into a "prettified" version, depicting the query in HTML with color highlighting and indentation. Figure 2 shows a prettified version of this query:

SELECT ?s1 WHERE { ?s1 ?p1 ?o1 . ?s2 ?p2 ?o2 }".

The first step in the transformation process is to create a simplified representation of the query, from which you can more easily generate the desired output. You can enhance the parser created previously to create a more useful AST. The parser is already creating ASTs (because of the "output = AST" parser option), but by default the parser generates "flat" tree structures that contain all tokens from the stream regardless of their relevance. For example, the query SELECT ?s WHERE { ?s1 ?p1 ?o1. ?s2 ?p2 ?o2 } currently creates an AST similar to Figure 3:

Figure 3. Flat AST: By default ANTLR generates a flat AST where tokens simply follow one another in a series.
Figure 4. Hierarchical AST: Rewrite rules can be defined for rule alternatives that re-arrange and create nodes to form a more easily processed representation.
The goal is to create an AST that can be processed more easily to create a prettified SPARQL query. Therefore, an AST similar to Figure 4 would be more useful.

Notice that the second AST structures information in a way that's easier for programs traversing the tree to process, and that mirrors SPARQL's inherent structure. The tree structure provides information that a simple sequence cannot—for example, that a clause contains two triples, each of which contains three elements. This AST also eliminates unnecessary tokens such as braces and periods.

You can use rewrite rules to add structure to generated trees. These rules optionally follow each rule alternative in the grammar and are distinguished by an arrow symbol (->) as shown below.


Figure 5. Changing Order with Rewrite Rules: Rewrite rules can be used to control the order of the resultant AST.
The AST for this rule looks like the one shown in Figure 5.

The rewrite rule on the right side of the arrow in Figure 5 lists elements from the rule alternative on the left side. The order in which elements are listed defines their position in the tree. The rewrite rule in Figure 5 switches the position of TOKEN_A and TOKEN_B in the tree. You can also introduce hierarchical rewrites into the tree using the carat (^) operator as shown below.


Figure 6. Adding Hierarchy with Rewrite Rules: Rewrite rules can also be used to introduce hierarchy into the resultant AST.
Figure 6 shows the AST for this rule.

Often it is necessary to insert nodes into the AST that do not exist in the token stream. You accomplish this by defining and inserting a node for an imaginary token. You define imaginary tokens in a tokens block in the grammar as follows.

grammar MyGrammar; tokens { MY_TOKEN; }

Rewrite rules refer to these tokens when building imaginary nodes in ASTs:

Figure 7. Introducing Imaginary Tokens: Rewrite rules can insert unparsed tokens from the input stream that are structurally significant to the AST.


At this point the newly formed AST would look like the one shown in Figure 7:

Here's how to enhance the current SPARQL parser output to form triples hierarchically, as shown in Figure 4. The first step is, of course, to create a unit test. You want to assert that the parser not only recognizes the textual description of a triple, but that it correctly emits an AST that meets expectations. Here's one way of representing such a test:

src/test/java/com/devx/sparql/parser/TriplesSameSubjectTest.java public class TriplesSameSubjectTest extends ParserTest { @Test public void variableOrTermAndPropertyListNotEmpty() throws Exception { Tree expected = SparqlAstBuilder.buildTripleTree(); Tree actual = parseTreeFor( "?s ?p ?o" ); assertEquals( expected, actual ); } ... }

The preceding code intentionally pulls the formation of the expected output into a separate class. Because the output of the parser is the input to the Tree Walker (covered shortly), it is highly advantageous to assert that both are using the same AST representation. In this case, the code that creates the expected triple forms a tree of ComparableTree objects as follows.

src/test/java/com/devx/sparql/helper/SparqlAstBuilder.java public class SparqlAstBuilder { ... public static Tree buildTripleTree() { Tree tree = buildTree( SparqlParser.TRIPLE, "TRIPLE" ); tree.addChild( buildVariableTree( "?s" ) ); tree.addChild( buildVariableTree( "?p" ) ); tree.addChild( buildVariableTree( "?o" ) ); return tree; } public static Tree buildVariableTree( String name ) { return buildTree( SparqlParser.QUESTION_VARNAME, name ); } public static Tree buildTree( int type ) { return new ComparableTree( new CommonToken( type ) ); } ... }

ComparableTree implements ANTLR's Tree interface by subclassing CommonTree. ComparableTree is defined in the downloadable source for this article, and provides equals() and hashCode() operations that allow you to compare trees using assertions, as shown in the previous test.

To make this test pass, apply a rewrite rule to the grammar to morph the AST into the expected form in the buildTripleTree() method:

// src/main/antlr/com/devx/sparql/Sparql.g triplesSameSubject : variableOrTerm propertyListNotEmpty -> ^( TRIPLE variableOrTerm propertyListNotEmpty ) | triplesNode propertyList -> ^( TRIPLE triplesNode propertyList ) ;

The preceding rule introduces an imaginary node of type TRIPLE as the parent of the triple tree. Its children will be the ASTs generated by sub-rules such as variableOrTerm and propertyListNotEmpty. You can obtain the completed grammar rules and test cases from the accompanying source bundle.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date