Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Create Domain-Specific Languages with ANTLR

The latest version of ANTLR provides the tools you need to build a parser for special-purpose languages.

he concept of Domain-Specific Languages (DSLs) has been around for years, but has recently experienced an uptick in popularity. This is due in large part to the success of Ruby on Rails, which exploits the Ruby programming language and its meta-programming capabilities to encourage the development of programs that closely mirror the domains they model. This style of DSL is sometimes called an internal DSL because it uses available constructs from the programming language itself to create code that reads in a manner similar to the underlying domain. Here are a couple of examples of internal DSLs:

// Ruby - Active Record for Object Relational Mapping class Book < ActiveRecord::Base belongs_to :category has_many :chapters end // Java - Using JUnit 4.4 for verifying unit test assertions assertThat(list, hasItems("foo", "bar", "baz"));

The above examples are intended to be interpreted sensibly by both subject matter experts and programmers through the use of variable names chosen from the domain's vocabulary, minimizing constructs that read too "tech-y," and organizing syntactic elements in a more "natural" order. The Rails example also uses pluralization rules to further enhance meaning and readability.

Programming languages obviously differ in their ability to express domain concepts and relations using only built-in programming constructs. When a given application requires more power and flexibility than a programming language alone can offer, developers can create external DSLs using tools such as ANTLR (which stands for ANother Tool for Language Recognition) and/or Lex/Yacc. These tools allow developers to specify more general grammars for parsing and processing by computer programs.

ANTLR Basics
Language recognition means the ability to recognize patterns of language within a stream of characters. ANTLR is a powerful tool for creating external DSLs and has recently been vastly upgraded with its 3.0 release. Although ANTLR is best known for its ability to support the development of DSLs, it is more generally a parser generator, characterized by its use of a recursive descent parsing algorithm along with LL(*) lookahead (for more information, consult the related resources at the end of this article).

ANTLR can emit parsers in many languages, including Java, Python, Ruby, and C#; however, ANTLR itself is not a parser. Instead, it processes a specification for a particular grammar and then generates parser components for that grammar.

Figure 1. ANTLR Data Flow: Lexers, parsers, tree walkers, and templates collaborate to recognize languages and transform language input from one form to another.
The major components of ANTLR-generated language recognizers are Lexers, Parsers, and Tree Walkers. Lexers in ANTLR transform streams of characters into streams of tokens in a manner governed by a grammar's lexer rules. Tokens are the smallest atomic unit of parser input in ANTLR; they correspond roughly to "words" in natural language. Parsers examine a token stream looking for groups of tokens that match its grammar rules. Other programs can then meaningfully consume parser output by processing the resultant Abstract Syntax Tree (AST), or by using Tree Walkers. Tree Walkers follow the parsing step and may apply another round of parsing on top of the AST, enabling the calling program to take actions in response to patterns found in the AST. Optionally, Tree Walkers can also create templates for generating output (see Figure 1).

This article showcases ANTLR's features for developing a language recognizer by showing how to develop a SPARQL validator and "pretty-printing" formatter. You'll see the basic steps involved in creating a language recognizer, including how to develop a grammar and a parser, and how to transform a parsed understanding of input into a different format for output. Together, these techniques should put you well on your way to creating your own DSLs.

SPARQL is a recursive acronym for SPARQL Protocol and Resource Description Framework (RDF) Query Language. SPARQL is a language for specifying queries against graphs of subject-predicate-object semantic statements. Both SPARQL and RDF are interesting, and they are cornerstone technologies in the emerging semantic web, but you won't need knowledge of them to follow along with this article.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date