Create Domain-Specific Languages with ANTLR

Create Domain-Specific Languages with ANTLR

he concept of Domain-Specific Languages (DSLs) has been around for years, but has recently experienced an uptick in popularity. This is due in large part to the success of Ruby on Rails, which exploits the Ruby programming language and its meta-programming capabilities to encourage the development of programs that closely mirror the domains they model. This style of DSL is sometimes called an internal DSL because it uses available constructs from the programming language itself to create code that reads in a manner similar to the underlying domain. Here are a couple of examples of internal DSLs:

   // Ruby - Active Record for Object Relational Mapping   class Book < ActiveRecord::Base     belongs_to :category     has_many   :chapters   end         // Java - Using JUnit 4.4 for verifying unit test assertions   assertThat(list, hasItems("foo", "bar", "baz"));

The above examples are intended to be interpreted sensibly by both subject matter experts and programmers through the use of variable names chosen from the domain's vocabulary, minimizing constructs that read too "tech-y," and organizing syntactic elements in a more "natural" order. The Rails example also uses pluralization rules to further enhance meaning and readability.

Programming languages obviously differ in their ability to express domain concepts and relations using only built-in programming constructs. When a given application requires more power and flexibility than a programming language alone can offer, developers can create external DSLs using tools such as ANTLR (which stands for ANother Tool for Language Recognition) and/or Lex/Yacc. These tools allow developers to specify more general grammars for parsing and processing by computer programs.

ANTLR Basics
Language recognition means the ability to recognize patterns of language within a stream of characters. ANTLR is a powerful tool for creating external DSLs and has recently been vastly upgraded with its 3.0 release. Although ANTLR is best known for its ability to support the development of DSLs, it is more generally a parser generator, characterized by its use of a recursive descent parsing algorithm along with LL(*) lookahead (for more information, consult the related resources at the end of this article).

ANTLR can emit parsers in many languages, including Java, Python, Ruby, and C#; however, ANTLR itself is not a parser. Instead, it processes a specification for a particular grammar and then generates parser components for that grammar.

Figure 1. ANTLR Data Flow: Lexers, parsers, tree walkers, and templates collaborate to recognize languages and transform language input from one form to another.

The major components of ANTLR-generated language recognizers are Lexers, Parsers, and Tree Walkers. Lexers in ANTLR transform streams of characters into streams of tokens in a manner governed by a grammar's lexer rules. Tokens are the smallest atomic unit of parser input in ANTLR; they correspond roughly to "words" in natural language. Parsers examine a token stream looking for groups of tokens that match its grammar rules. Other programs can then meaningfully consume parser output by processing the resultant Abstract Syntax Tree (AST), or by using Tree Walkers. Tree Walkers follow the parsing step and may apply another round of parsing on top of the AST, enabling the calling program to take actions in response to patterns found in the AST. Optionally, Tree Walkers can also create templates for generating output (see Figure 1).

This article showcases ANTLR's features for developing a language recognizer by showing how to develop a SPARQL validator and "pretty-printing" formatter. You'll see the basic steps involved in creating a language recognizer, including how to develop a grammar and a parser, and how to transform a parsed understanding of input into a different format for output. Together, these techniques should put you well on your way to creating your own DSLs.

SPARQL is a recursive acronym for SPARQL Protocol and Resource Description Framework (RDF) Query Language. SPARQL is a language for specifying queries against graphs of subject-predicate-object semantic statements. Both SPARQL and RDF are interesting, and they are cornerstone technologies in the emerging semantic web, but you won't need knowledge of them to follow along with this article.

Getting Started
Creating an ANTLR-based application involves code generation, and thus requires more steps than developing a typical application. The general steps necessary to build an ANTLR-based application are:

  1. Develop a language description (grammar file) for the language to parse, which this article will explore later using a Test-Driven Development (TDD) approach later in this article.
  2. Execute ANTLR to generate source code for parsing the specified grammar.
  3. Compile the ANTLR-generated source together with the application source.
  4. Package the compiled source, including both application and ANTLR dependencies.
Authors' Note: As proponents of repeatable, reliable automated builds, we used a Maven plug-in to accomplish steps 2-4 in the preceding list. Other forums have documented how to perform such steps using Ant, so we'd like to point out how you can use Maven and the ANTLR 3 Maven plug-in to build an ANTLR 3-based application. Maven's declarative nature and strong encapsulation of commonly repeated tasks make it well suited for this type of task. Also, it's convenient for developing (for example) minimal source bundles such as the downloadable code that accompanies this article.

Maven is heavily convention based. One of these conventions outlines a standard project directory structure. The sample application follows the conventions of Maven and the ANTLR 3 Maven plug-in. Its directory structure looks like:

/SPARQL-validator       /src/main/java               -> Java source files    /src/main/antlr               -> ANTLR grammar files    /src/test/java               -> Java test source files    /target/generated-sources/antlr -> generated parser files      pom.xml              -> Maven Project Object Model descriptor

By following this directory layout, all you need to leverage ANTLR in a Maven build is to bind the Maven ANTLR 3 plug-in to the generate-sources life-cycle phase of a Maven build, and to declare a run-time dependency on the ANTLR 3 library:

   SPARQL-validator/pom.xml        ...            ...                org.antlr         antlr         3.0                                ...                    org.codehaus.mojo           antlr3-maven-plugin           1.0-20071102.021231-1                        Sparql.g                            target/generated-sources/antlr                                                               generate-sources                                antlr                                                                 ...   

If you download the accompanying source, you should be able to build the application by running "mvn clean install" from the project's top-level directory.

Building a Grammar with TDD
Basic SPARQL RDF graph queries take the form of:

   SELECT ?subject   WHERE {     ?subject <#named> "Joe" .   }   

The document SPARQL-spec contains a specification of the SPARQL language in Backus-Naur Form (BNF), a notation for representing context-free language grammars—grammars that recognize syntactically correct sentences without making any judgments about the sentence's semantics. This notation describes the hierarchy of a given language's constructs. There is typically a top-level rule representing an entire sentence or sequence of sentences. The top-level rule will be matched by particular sequences of input matching lower-level rules. These lower-level rules, in turn, break down into lower-level matches—all the way down to tokens, the most atomic bits of a language.

The task here is to create an ANTLR grammar that can recognize legal SPARQL queries. As it happens, ANTLR grammars specify the rules of the languages they recognize in a BNF-like form, which usually makes for a smooth translation from a written language specification to an ANTLR grammar specification.

To develop the ANTLR-based recognizer in a test-driven manner, if you just mechanically translate the entire SPARQL BNF spec to ANTLR and then try to test it, it will be more difficult to track down errors. When you present a particular legal SPARQL query to the recognizer, and it fails, it can be difficult to ascertain the source of the error: Is it your fault, ANTLR's fault, or both? Which rule is the problem? How can you be sure? By instead building up an ANTLR grammar bit by bit, and testing repeatedly to validate any assumptions made along the way, you can be confident that the recognizer is working as intended, and make it much easier to detect whether future changes invalidate or support those assumptions.

When building a recognizer in TDD fashion, first examine the language's specification and find the lowest-level rules—those with as few (if any) dependencies on other rules as possible. Typically these are the language tokens. When you have built enough of the ANTLR grammar to recognize these atomic bits, with tests to support your claims that the recognizer exhibits the desired behavior, you can tackle successively higher-level rules.

Before you begin writing lexer and parser rules, there are some things about ANTLR grammar files that you should know. The salient parts of the SPARQL grammar created for this article can illustrate:

   // src/main/antlr/com/devx/sparql/Sparql.g   grammar Sparql;   options {       output = AST;       k = 1;   }      @header {       package com.devx.sparql;   }      @lexer::header {       package com.devx.sparql;   }      @members {       protected void mismatch(         IntStream input, int tokenType, BitSet follow )          throws RecognitionException {               throw new MismatchedTokenException(              tokenType, input );       }              public void recoverFromMismatchedSet(           IntStream input, RecognitionException ex,           BitSet follow )          throws RecognitionException {                       throw ex;       }   }      @rulecatch {       catch ( RecognitionException ex ) {           throw ex;       }   }

ANTLR grammars typically reside in files with a .g suffix. The first declaration (grammar Sparql;) indicates the name of the ensuing grammar. Processing this grammar file will generate the Java classes SparqlLexer for the lexer, and SparqlParser for the parser.

The options section allows you to declare certain options for the grammar. In the example above, output = AST means that ANTLR will generate a parser whose rules each yield an abstract syntax tree. k = 1 means that the parser will use only one token worth of look-ahead into the token stream to decide which rule to try to match the input against. ANTLR parsers are capable of using more look-ahead; in fact, by default, ANTLR 3 can use an arbitrary amount of look-ahead using the LL(*) recognition strategy, as opposed to LL(k) for some fixed value of k. However, because SPARQL language is declared to be an LL(1) language, an explicit k = 1 setting in the ANTLR grammar file will create a more efficient parser.

@header and @lexer::header let you specify constructs in the language in which the lexer and parser will be generated; these should occur at the beginning of the generated source for the lexer and parser. The example uses those sections to place the generated lexer and parser in a Java package other than the default.

The @members command allows you to write code in the same language as the generated parser; these will be treated as fields or methods of the generated parser class.

The @rulecatch command lets you specify an exception-handling strategy. In the example, whenever the parser or lexer raises a RecognitionException (ANTLR's top-level parsing exception), the application will simply propagate it rather than attempting error recovery. More robust grammars attempt more sophisticated error handling and reporting.

With this boilerplate out of the way, here's a first test and the corresponding lexer rule. The SPARQL specification states that the rule PN_CHARS_BASE describes which Unicode characters can be used in SPARQL-prefixed names, and that the rule depends on no other rules:

   PN_CHARS_BASE ::= [A-Z] | [a-z] |      [#x00C0-#x00D6] | [#x00D8-#x00F6] |      [#x00F8-#x02FF] | [#x0370-#x037D] |      [#x037F-#x1FFF] | [#x200C-#x200D] |      [#x2070-#x218F] | [#x2C00-#x2FEF] |      [#x3001-#xD7FF] | [#xF900-#xFDCF] |      [#xFDF0-#xFFFD]

Translated to ANTLR, the lexer rule becomes:

       PN_CHARS_BASE : ( 'A' .. 'Z'           | 'a' .. 'z'           | 'u00C0' .. 'u00D6'           | 'u00D8' .. 'u00F6'           | 'u00F8' .. 'u02FF'           | 'u0370' .. 'u037D'           | 'u037F' .. 'u1FFF'           | 'u200C' .. 'u200D'           | 'u2070' .. 'u218F'           | 'u2C00' .. 'u2FEF'           | 'u3001' .. 'uD7FF'           | 'uF900' .. 'uFDCF'           | 'uFDF0' .. 'uFFFD'       )       ;

This rule will match any single character in any of the ranges listed above.

Rules in ANTLR grammars are realized as methods on a generated lexer or parser. By convention, when a rule name begins with an uppercase letter, the rule is a lexer rule; those that begin with a lowercase letter are parser rules. Recall that lexers emit tokens that parsers consume to match higher-level grammatical structures. In this case, the generated lexer will have a method called mPN_CHARS_BASE(), which will read a character of input and check whether it is in any of the characters in the ranges specified. If the character matches, the method consumes the character and returns a token representing the character. Otherwise, ANTLR raises a RecognitionException.

Here's how the first test for this rule might look. Let's start out asserting that each of the letters "a" through "z" are recognized as a PN_CHARS_BASE:

   // src/test/java/com/devx/sparql/lexer/   import org.antlr.runtime.*;   import org.junit.*;   import static org.junit.Assert.*;      public class PnameCharactersBaseTest {     @Test       public void shouldRecognizeLowercaseLetters()          throws Exception {            for ( char ch = 'a'; ch <= 'z'; ++ch ) {            String input = String.valueOf( ch );            SparqlLexer lexer = lexerOver( input );            lexer.mPN_CHARS_BASE();  // void method                           Token token = lexerOver( input).nextToken();            assertEquals( "couldn't match " + input +                "?", input, token.getText() );            assertEquals( "token type for " + input +                "?", SparqlLexer.PN_CHARS_BASE,                token.getType() );         }      }          private SparqlLexer lexerOver( String input ) {          SparqlLexer lexer = new SparqlLexer();          lexer.setCharStream( new ANTLRStringStream( input ) );          return lexer;      }   }

This test exercises the lexer rule in two ways. The first, which is typically used by generated lexers and parsers, is to invoke the void method corresponding to the rule name. This method will consume input if a match occurs, and set other lexer states that are difficult to sense in a test. The second way is to invoke the nextToken() method on the lexer, and ensure that it produces a token with the desired type and text. Both are effective ways of ensuring that the correct inputs are either matched or rejected.

Here's a negative test to ensure that an exception is raised when the rule detects a non-matching character:

       @Test( expected = RecognitionException.class )       public void shouldRejectTheZeroCharacter() throws Exception {           lexerOver( "" ).mPN_CHARS_BASE();       }

Testing that the remaining character ranges match, and that characters outside of the legal ranges are rejected, is left as a reader exercise. You can check out the full ANTLR grammar and accompanying tests in the downloadable code accompanying this article.

As you build up your suite of tests, you will find many opportunities to factor out common code into helper methods. The lexerOver() method above is one such example of a refactoring.

You can further leverage existing tests when certain inputs for rules they test also match other rules. For example, consider the lexer rule PN_CHARS_U:

       PN_CHARS_U ::= PN_CHARS_BASE | '_'

Translated to ANTLR:

       PN_CHARS_U : ( PN_CHARS_BASE | '_' );

This rule matches any single character that would match PN_CHARS_BASE, plus the underscore character. Given their similarity, it would be a shame to have to duplicate all the tests written to verify PN_CHARS_BASE in a test class for PN_CHARS_U. Fortunately, you don't have to—you can leverage inheritance and exploit the way JUnit discovers tests on a test class to eliminate the duplication. Listing 1 shows a refactored abstract LexerTest, a PnameCharactersBaseTest that uses LexerTest's helpers and overrides, and a PnameCharactersPlusUnderscoreTest that derives in turn from PnameCharactersBaseTest.

You follow a similar pattern to write tests for parser rules. Start by finding language rules with the fewest dependencies, test those, and work upward. Here's a snapshot description of the test code for the parser rule triplesSameSubject.

The SPARQL grammar rule is:

       TriplesSameSubject ::= VarOrTerm          PropertyListNotEmpty |          TriplesNode PropertyList

Translated to ANTLR (definitions of lower-level rules omitted here):

      triplesSameSubject : varOrTerm          propertyListNotEmpty |          triplesNode propertyList;

Listing 2 shows the test code.

As with lexer rules, you will find that you can use the inheritance technique above to reuse parser rule tests. For example, note that in SPARQL, the rule NumericExpression matches everything that the rule AdditiveExpression matches. If you already have tests for the rule AdditiveExpression, you can use them to recognize and test NumericExpressions as well, just by firing a different parser rule in the tests for NumericExpression.

Building an Intermediate AST

Figure 2. Prettified SPARQL: The SPARQL prettifier example color codes key words and indents blocks.

Now that you have a basic SPARQL recognizer, the next step is to transform SPARQL input to some other form of output. This example transforms a SPARQL query into a "prettified" version, depicting the query in HTML with color highlighting and indentation. Figure 2 shows a prettified version of this query:

   SELECT ?s1 WHERE { ?s1 ?p1 ?o1 . ?s2 ?p2 ?o2 }".

The first step in the transformation process is to create a simplified representation of the query, from which you can more easily generate the desired output. You can enhance the parser created previously to create a more useful AST. The parser is already creating ASTs (because of the "output = AST" parser option), but by default the parser generates "flat" tree structures that contain all tokens from the stream regardless of their relevance. For example, the query SELECT ?s WHERE { ?s1 ?p1 ?o1. ?s2 ?p2 ?o2 } currently creates an AST similar to Figure 3:

Figure 3. Flat AST: By default ANTLR generates a flat AST where tokens simply follow one another in a series.
Figure 4. Hierarchical AST: Rewrite rules can be defined for rule alternatives that re-arrange and create nodes to form a more easily processed representation.

The goal is to create an AST that can be processed more easily to create a prettified SPARQL query. Therefore, an AST similar to Figure 4 would be more useful.

Notice that the second AST structures information in a way that's easier for programs traversing the tree to process, and that mirrors SPARQL's inherent structure. The tree structure provides information that a simple sequence cannot—for example, that a clause contains two triples, each of which contains three elements. This AST also eliminates unnecessary tokens such as braces and periods.

You can use rewrite rules to add structure to generated trees. These rules optionally follow each rule alternative in the grammar and are distinguished by an arrow symbol (->) as shown below.

   myRule        : TOKEN_A TOKEN_B TOKEN_C ->          TOKEN_B TOKEN_A TOKEN_C       ;
Figure 5. Changing Order with Rewrite Rules: Rewrite rules can be used to control the order of the resultant AST.

The AST for this rule looks like the one shown in Figure 5.

The rewrite rule on the right side of the arrow in Figure 5 lists elements from the rule alternative on the left side. The order in which elements are listed defines their position in the tree. The rewrite rule in Figure 5 switches the position of TOKEN_A and TOKEN_B in the tree. You can also introduce hierarchical rewrites into the tree using the carat (^) operator as shown below.

   myRule       : TOKEN_A TOKEN_B TOKEN_C ->          ^( TOKEN_B TOKEN_A TOKEN_C )       ;
Figure 6. Adding Hierarchy with Rewrite Rules: Rewrite rules can also be used to introduce hierarchy into the resultant AST.

Figure 6 shows the AST for this rule.

Often it is necessary to insert nodes into the AST that do not exist in the token stream. You accomplish this by defining and inserting a node for an imaginary token. You define imaginary tokens in a tokens block in the grammar as follows.

   grammar MyGrammar;   tokens {       MY_TOKEN;   }

Rewrite rules refer to these tokens when building imaginary nodes in ASTs:

Figure 7. Introducing Imaginary Tokens: Rewrite rules can insert unparsed tokens from the input stream that are structurally significant to the AST.
   myRule        : TOKEN_A TOKEN_B TOKEN_C ->          ^( MY_TOKEN TOKEN_A TOKEN_C )       ;

At this point the newly formed AST would look like the one shown in Figure 7:

Here's how to enhance the current SPARQL parser output to form triples hierarchically, as shown in Figure 4. The first step is, of course, to create a unit test. You want to assert that the parser not only recognizes the textual description of a triple, but that it correctly emits an AST that meets expectations. Here's one way of representing such a test:

   src/test/java/com/devx/sparql/parser/   public class TriplesSameSubjectTest extends ParserTest {      @Test      public void          variableOrTermAndPropertyListNotEmpty()          throws Exception {            Tree expected = SparqlAstBuilder.buildTripleTree();         Tree actual = parseTreeFor( "?s ?p ?o" );            assertEquals( expected, actual );      }      ...   }

The preceding code intentionally pulls the formation of the expected output into a separate class. Because the output of the parser is the input to the Tree Walker (covered shortly), it is highly advantageous to assert that both are using the same AST representation. In this case, the code that creates the expected triple forms a tree of ComparableTree objects as follows.

   src/test/java/com/devx/sparql/helper/   public class SparqlAstBuilder {      ...      public static Tree buildTripleTree() {         Tree tree = buildTree( SparqlParser.TRIPLE, "TRIPLE" );         tree.addChild( buildVariableTree( "?s" ) );         tree.addChild( buildVariableTree( "?p" ) );         tree.addChild( buildVariableTree( "?o" ) );            return tree;      }         public static Tree buildVariableTree( String name ) {         return buildTree( SparqlParser.QUESTION_VARNAME, name );      }         public static Tree buildTree( int type ) {         return new ComparableTree( new CommonToken( type ) );      }      ...   }

ComparableTree implements ANTLR's Tree interface by subclassing CommonTree. ComparableTree is defined in the downloadable source for this article, and provides equals() and hashCode() operations that allow you to compare trees using assertions, as shown in the previous test.

To make this test pass, apply a rewrite rule to the grammar to morph the AST into the expected form in the buildTripleTree() method:

   // src/main/antlr/com/devx/sparql/Sparql.g   triplesSameSubject      : variableOrTerm propertyListNotEmpty ->        ^( TRIPLE variableOrTerm propertyListNotEmpty )       | triplesNode propertyList ->        ^( TRIPLE triplesNode propertyList )       ;

The preceding rule introduces an imaginary node of type TRIPLE as the parent of the triple tree. Its children will be the ASTs generated by sub-rules such as variableOrTerm and propertyListNotEmpty. You can obtain the completed grammar rules and test cases from the accompanying source bundle.

Tree Walkers
With a simplified version of the SPARQL query in place in the form of an AST, it's time to do something useful with it. As mentioned previously, you can use tree walkers to further process ASTs and take actions such as emitting output. Tree walkers are parsers that take ASTs as input. As such, defining them is extremely similar to defining the parsers you've already seen. Here's an example:

   // src/main/antlr/com/devx/sparql/SparqlWalker.g   tree grammar SparqlWalker;   options {       ASTLabelType=CommonTree;       tokenVocab=Sparql;        output=template;   } 

The preceding snippet defines a tree walker named SparqlWalker, which takes trees as input and creates templates as output. This walker is configured to re-use the token vocabulary defined for the parser. This reuse is important, because the expression of the shared AST needs to be consistent between the parser and the walker. Because the walker cannot be generated correctly until after the parser has been generated, you need to bind a separate execution of the ANTLR Maven plug-in to the generate-sources phase of the Maven build. You can see an example in the downloadable source.

Given the AST shown in Figure 4, a rule to walk the AST representation of a triple might look like this:

   triple       : ^( TRIPLE s=element p=element o=element )       ;      element       : QUESTION_VARNAME       | DOLLAR_VARNAME       ;

Notice that the input the walker is recognizing matches the output of the parser. The examples that follow use the String Template framework to emit prettified SPARQL. String Template is closely related to the ANTLR project and is tightly integrated with ANTLR itself. Instead of creating an AST, tree walkers can output StringTemplates. StringTemplates can be populated with attributes from the grammar and can render output by substituting these attributes into a textual template description. For example, the code below captures the text of an element in an attribute and renders the text into a simple string using an anonymous template:

   element       : QUESTION_VARNAME -> template(          name = {$QUESTION_VARNAME.text} ) "$name$"       | DOLLAR_VARNAME -> template(           name = {$DOLLAR_VARNAME.text} ) "$name$"       ;

The template keyword following the arrow identifies an anonymous template to use to render the output. You define anonymous templates inside quoted strings following the template definition. In the example above, this template reads "$name$". In the example, if the text of the matching QUESTION_VARNAME token was ?s, then the output of this template would be simply ?s. That's a pretty basic example; here's one that's slightly more interesting—the triple itself:

   triple       : ^( TRIPLE s=element p=element o=element ) ->           triple( subject={$},           predicate={$},           object={$} )       ;

The fragment above refers to a named template, triple. This template is populated with three attributes, each of which is the result of evaluating another template—the anonymous template from the element rule. The triple template is defined separately from the walker grammar. In the full example this file is named sparql.stg (STG stands for String Template Group). The file contains definitions for each template referenced in the grammar. The definition for the triple template is simple:

   // src/main/resources/templates/sparql.stg   triple(subject, predicate, object) ::= <<   $subject$ $predicate$ $object$ .   >>

The text between the << and >> delimiters is the basic output of the template. You substitute attributes into the output by wrapping them with dollar signs as shown in the example above. HTML is one possible output medium for prettified SPARQL. HTML allows you to decorate different types of text by defining and applying styles. For example, the following templates prepare a basic HTML page for a query and assign a CSS class named clause to each SELECT clause, which causes them to appear in purple.

   query(selectClause, whereClause) ::= <<             CSS Example      $selectClause$
$whereClause$ >> selectClause(element) ::= << SELECT $element$ >>

Lastly, to prettify the SPARQL further, each WHERE clause should be indented and separated by a line break. The following walker rule populates the whereClause attribute with triples from a SPARQL query.

   whereClause       : ^( WHERE triples+=triple+ ) ->             whereClause( triples={$triples} )       ;

What's different here is that the walker rule populates the triples attribute with a list of StringTemplates. The template for this construct is:

   whereClause(triples) ::= <<   WHERE
    $triples; separator="
} >>

Notice the $triples; separator=""$ construct, which specifies how to separate each element of the list. The StringTemplate supports auto-indentation of whitespace, but unfortunately the escaped representation of a space in HTML ( ) isn't whitespace itself—so the sample application doesn't capitalize on this feature.

Putting it All Together

You now have all the basic components for SPARQL prettification in place. Specifically, you have seen how to specify a grammar to build a lexer, parser, and tree walker. You've also seen how to use StringTemplate to emit structured output. The next step is to wire these components together; the Java code for combining them fairly straightforward. First, define the expected behavior by writing an integration test:

   src/test/java/com/devx/sparql/integration/   public class SparqlPrettifierTest {       @Test       public void shouldPrettifyOneTripleSparql() throws Exception {           SparqlPrettifier prettifier = new SparqlPrettifier();              String expected = readExpectation( "select-one-triple" );           String actual = prettifier.prettify(              "SELECT ?s WHERE { ?s ?p ?o}" );              assertEquals( expected, actual );       }       ...      }

The preceding test reads an HTML file (select-one-triple-output.html) from disk and compares that to the results of invoking the prettify() method on a SparqlPrettifier object. The SparqlPrettifier sets up all the transformation components to work together as shown below.

   // src/main/java/com/devx/sparql/   public class SparqlPrettifier {       public String prettify( String sparql )          throws Exception {           ByteArrayInputStream sparqlStream =              new ByteArrayInputStream( sparql.getBytes() );              ANTLRInputStream source =              new ANTLRInputStream( sparqlStream );           SparqlLexer lexer = new SparqlLexer( source );           CommonTokenStream tokens =              new CommonTokenStream( lexer );              SparqlParser parser = new SparqlParser( tokens );           Tree ast = (Tree) parser.query().getTree();           CommonTreeNodeStream nodes =              new CommonTreeNodeStream( ast );              Reader templatesIn = new InputStreamReader(              getClass().getResourceAsStream(              "/templates/sparql.stg" ) );           StringTemplateGroup templates =              new StringTemplateGroup(              templatesIn, DefaultTemplateLexer.class );              SparqlWalker walker = new SparqlWalker( nodes );           walker.setTemplateLib( templates );                      return walker.query().toString();       }   }   

These few lines wire together each of the generated components and return the result of the prettification transformation back to the caller. Even though this simple task requires a fair amount of work, you'll find that the process is the same for larger applications such as in the construction of a DSL, and it scales well.

ANTLR is a sophisticated tool for building language recognizers. You've seen how to use ANTLR to help you build programs that can perform complex tasks such as "pretty printing." You covered a lot of ground in this article, but you'll find you can get a lot of mileage out of ANTLR just by knowing the basics. ANTLR is quite powerful; the effort you invest in learning its capabilities will pay significant dividends. We hope you'll consider ANTLR when confronted with the challenge of creating DSLs for your software project.

Related Resources


About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist