Browse DevX
Sign up for e-mail newsletters from DevX


Three Java Variants Extend the Language : Page 4

Learn about three of the most promising Java variants—how they work, what their features do, and how you can integrate them into your development environment.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

EPP: The Extensible Preprocessor
The impressive EPP comes from the National Institute of Advanced Industrial Science and Technology in Japan. EPP takes the concept of a Java variant one step further. EPP isn't a modified form of Java, it's a toolkit for creating modified forms of Java. It allows you to create language extension plug-ins, which implement particular new features of the language. You can then pick and choose which new language features you want to use for any particular programming task.

EPP works by translation—that is, new features are implemented by translating them into regular Java. Thus, if you create a plug-in that adds a new keyword, this plug-in must identify any code that uses that keyword and translate the code into regular Java.

In addition to allowing you to create new language features, EPP also comes with a long list of example features. Enumerated types, assertions, associative arrays, macros, ifdefs, operator overloading, optional parameters, and multiple inheritance are just some of the interesting language features that have been successfully added to Java using EPP. (A complete discussion of EPP is beyond the scope of this article, but it does cover some of the highlights and provide a look at a working example.)

By now you've gotten the point that EPP isn't just a modified compiler like the other packages discussed in this article. In fact, EPP is an extensible compiler. It lets you intervene in the compilation process at a number of different points. These interventions are in the form of code that augments the code in the compiler.

EPP uses the object-oriented idea of a mixin. A mixin is a small fragment of code that can be added, or mixed into, one or more other classes. The following is an example of a mixin:

SystemMixin PRegex { class Epp { extend void initMacroTable() { original(); defineMacro( :"=~", new ApplyPRegexMacro() ); } }

The first line declares that this is a mixin called Pregex:

SystemMixin PRegex {

The next line declares that the mixin will be added to a class called Epp:

class Epp {

The extend keyword declares that a particular method should be overridden:

extend void initMacroTable() { // ... }

If you would like to call the already-existing code from your new code, you can do this using the original() method. The following definition of initMacroTable calls original before defining a new macro with defineMacro:

extend void initMacroTable() { original(); defineMacro( :"=~", new ApplyPRegexMacro() ); }

In some ways, this kind of programming is like programming by subclassing. The difference is that these mixins can be combined dynamically at run-time. Thus, you dont have to create a specific subclass containing all the modifications you want. You can decide at run-time which mixins to merge into your base class.

Lexical Analysis
If the feature you would like to add requires a deep syntactic change, then you can modify the lexical analyzer, which is the first step in the parsing process. You can do this by overriding one of the methods of the lexical analyzer, such as readOperator():

extend Token readOperator(EppInputStream in) { int begin = in.pointer(); char c = in.getc(); if (c=='/') { String regex = ""; // ... }

The default readOperator method carries out the low-level lexical analysis required to distinguish operators in the Java source code. By creating a modified form of this method, you can alter the lexical analyzer to recognize a new operator, for example.

If you don't know how to write a lexical analyzer routine for Java, don't worry. The default lexical analyzer is already written for you. All you have to do is write code for your new features and use the original method to access the default implementation to handle all other cases.

You can also modify the language at the parsing level after the lexical analysis is complete. The following is an example of this, taken from the EPP implementation of assertions:

extend Tree statementTop(){ if (lookahead() == :assert){ matchAny(); Tree exp1 = expression(); if (lookahead() == :":"){ matchAny(); Tree exp2 = expression(); match(:";"); return new Tree(:assert, exp1, exp2); } else { match(:";"); return new Tree(:assert, exp1); } } else { return original(); } }

This mixin method modifies the statementTop() method, which is one of the pieces of EPPs Java parser. The code checks for the assert syntax and, if it finds it, returns a new Tree object, which is the class used to represent a parsed Java program. If it tries to parse something that isnt an assert statement, then it calls the original method to pass the buck to the regular Java parser, which will handle all other cases.

The source code for the EPP compiler actually uses EPP itself. That is, if you look at the source code for EPP, you'll see syntactic constructs that you dont recognize as being regular Java. EPP actually has to process itself before it can be compiled. If this seems impossible, note that the original version of EPP was written not in Java but in Common Lisp.

A Full Example of EPP Plug-in
Take a look at a working example of an EPP plug-in. The example is called Pregex. It implements a small subset of Perl's regular expression (regex) syntax:

if (s =~ /^[A-Z][a-z]+/) { // ... }

This requires two syntactic constructs. First, you need the slash-quoted regex syntax:


You implement this by overriding the readOperator() method in the lexical analyzer. You scan the incoming characters and if you find a pair of slash characters with something between them, you just treat that like a string. Of course, this isnt the full regex syntax by any stretch, but it serves to illustrate the technique.

You also need to implement the =~ operator, which of course doesnt exist in Java. Override the initMacroTable()method to respond to the presence of =~ by invoking the ApplyPRegexMacro class. This class has a call method that translates the operator into a small fragment of Java code that calls the Pregex.checkMatch() method, which is defined in a helper class.

As you can see, these techniques are extremely powerful. Since you are able to extend Java using Java itself, you can implement arbitrarily complex translations and transformations—limited only by your abilities as a programmer. EPP transforms Java into a truly programmable language.

Listing 5 (PlugIn.java) and Listing 6 (PRegex.java) show the full source for the PRegex example. Click here for complete instructions for running the EPP variant.

Extending Java's Conservative Design
The Java language has benefited from an extremely conservative design. In the interest of creating a powerful language with simple and predictable semantics, the original designers avoided a lot of the more controversial features that can be found in existing languages.

A number of projects extend Java in interesting ways, and this article has explored some of them. Of particular interest are those systems that are extensible, such as EPP. Such systems allow you to try out new ideas in language design without having to face the onerous burden of implementing a full compiler.

Greg Travis is a freelance Java programmer and technology writer living in New York City. After three years in the world of high-end PC games, he joined EarthWeb, where he developed new technologies with the then-new Java programming language. Since 1997, he has been a consultant in a variety of Web technologies. E-mail him at mito@panix.com.
Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date