Three Java Variants Extend the Language

Three Java Variants Extend the Language

ince the day Java was released in 1995, people have been clamoring for new features. Peruse the Java discussion groups on the Internet and you’ll find that developers are begging for tantalizing features like class and function templates, operator overloading, multiple inheritance, and closures.

However, read further and you’ll understand why many of these features haven’t been added yet. Sun Microsystems’ Java JDK design team makes a point of including only features that they could implement successfully. They left out any controversial or imperfect features. The result is a clean, well-defined language that is easy for the beginner and useful for the expert.

Still, people want their new features. In fact, some ambitious researchers have taken it upon themselves to augment Java with new features and release these new versions (or variants) of the language to the public. This article examines three of the more promising Java variants: Pizza, MultiJava, and EPP. Each of these has interesting features. For each of the features, the article demonstrates a complete program you can compile and run, allowing you to try the variants out for yourself.

What Is a Java Variant?
What exactly does the term Java variant mean anyway? For one, a library or package is not a variant. People create libraries for Java all the time; Sun is constantly adding new packages and libraries to the core Java distribution with every new release. But the language itself has changed very little over the years, as has the basic syntax and semantics.

A few notable syntax changes have occurred, and of course the core classes have undergone many changes. Although the distinction between a library change and a true language change isn’t clear-cut, the changes within the variants discussed in this article are. Each one involves a syntax change, and they each are a true addition to the core semantics of the language.The Pizza Variant
The Pizza package originated at the University of Southern Australia and now is an independent project hosted at It provides three major features:

  • Generics
  • Function pointers
  • Class cases
  • The first two are features that many developers have asked for in Java. They are included in a number of different Java variants.

    A generic is a method, function, or class that does not completely specify all of the types of its variables, because the types can be specified later. The result is multiple versions of the method, function, or class that is specialized to a particular set of types.

    Generics fall under the rubric of parametric polymorphism (PP). PP exists in a number of forms, in numerous languages, and with many different names. For example, the C++ version of generics is templates. So in the case of Java, generics basically add templates to the language?and they are quite easy to use. As an example, you can create a simple pair class (a pair is a data structure containing a head and a tail).

    In regular Java, you could represent the head and tail as Objects. To get access to either, you would use the head() or tail() methods, each of which return an Object. Listing 1 shows, which implements a pair without using generics.

    The problem is that many developers find casting from Object to Integer annoying. Creating a generic class can improve this process. You can declare a generic class by specifying one or more type parameters, like so:

    public class MyPair{  ...}

    The part of this declaration looks kind of like a function declaration, which is fine because it’s supposed to. Just as a function takes some values and returns an answer, a generic class is a kind of meta-function that takes some types and returns a class.

    For example, you can declare a pair whose head and tail are both integers, like this:

    MyPair p = new MyPair( 20, 30 );

    This declares a pair called p, which is actually a pair of integers. MyPair is a class, while MyPair is really a class template (not to be confused with C++ templates!). MyPair isn’t really a type, so you use it directly. You must instantiate it by supplying types. In this case, you supplied int and int, resulting in MyPair, which you can use directly (as shown above).

    The nice thing about such a parameterized type is that you don’t have to use casts to access its members. For example, look at the declaration of the head() method from MyPair:

    public Head head() {  return head;}

    Note the return type. It isn’t Object or Integer, it’s Head?that is, it’s whatever type was declared for the head value of the pair. Accessing the head thus doesn’t require a cast:

    int head = p.head();

    Listing 2 shows the complete source for MyPair.

    Function Pointers
    Pizza also adds a neat feature called a function pointer. A function pointer lets you refer to a function (method) as if it were a regular value. You can also create functions that aren’t really methods?they are just pure functions like you’d find in non-object-oriented languages. (If you are familiar with functional languages such as Scheme, you might have encountered these features under the name closure.)

    The following is an example of a function called isOdd, which tells you whether a number is odd:

     (int)->boolean isOdd =  fun( int n ) -> boolean {    return (n&1)==1;  };

    The syntax is a little tricky. First, note the fun keyword. Instead of writing isOdd( int n ) { … }, you write fun( int n ) { … }, for example:

    boolean isOdd( int n ) { ... }

    instead of:

    fun( int n ) -> boolean { ... }

    This function has no name. Instead of declaring the function as having the name isOdd, the code declares it anonymously. You then can assign it to a variable called isOdd:

     (int)->boolean isOdd =  fun( int n ) -> boolean {    return (n&1)==1;  };

    The first line declares a variable called isOdd. The type of this variable is (int)->Boolean, the type that takes an integer as a parameter and returns a boolean. Using such a function is easy. You just use the variable name as if it were the function name:

    isOdd( 23 );

    So what do you gain from this? Well, for one thing, you easily can assign another function to the same variable, like this:

    isOdd = fun( int n ) -> boolean { ... };

    This can be a completely different anonymous function. Also, you can assign such functions from variable to variable:

    isOdd = reallyFastIsOdd;

    One useful application of this is passing a function as a parameter to another function. For example, you can implement a filter that takes an array of integers and use it with an anonymous function that returns a boolean. This kind of function is often called a predicate. It answers a true-or-false question about an integer. Your filter method would run through a list of integers and ask the true-or-false question about each one. It would collect the ones that give a true answer and return an array containing them.

    So, for example, if you have an array of integers called integers, you can filter out the even ones as follows:

    int odds[] = filter( integers, isOdd );

    The second argument to filter is the isOdd function, which returns true for odd numbers only.

    You can be even terser by just passing the predicate directly to filter without assigning it to isOdd first, like this:

    int odds[] = filter( integers,  fun( int n ) -> boolean {    return (n&1)==1;  } );

    Listing 3 includes the complete implementation of filter. Click here for complete instructions for running the Pizza variant.

    The MultiJava Variant
    The MultiJava package is from Iowa State University. MultiJava adds two features:

  • Symmetric multiple dispatch ? Symmetric multiple dispatch lets you specify several different implementations for a particular method. The implementations depend on the dynamic types of that methods arguments.
  • Open classes ? Open classes are classes that you can add methods to without actually modifying the original source code.
  • Symmetric Multiple Dispatch
    Symmetric multiple dispatch allows you to provide multiple implementations of a method. The choice of which one to use depends on the run-time type of the arguments to that method. For example, consider two variants of a method called show(). The idea here is that when you show a string, you want to put quotes around it. So you declare a special show() method just for strings, which puts quotes around the object’s representation:

    public void show( Object o ) {  System.out.println( o );}public void show( Object@String s ) {  System.out.println( """+s+""" );}

    Note the syntax change?the second variant has an argument of type Object@String, which isn’t a regular Java type. This type specifier says: use this variant if the object that is being passed in is actually a String. Sounds a lot like method overloading, doesnt it? For example, in regular Java you easily could write this:

    public void show( Object o ) {  System.out.println( o );}public void show( String s ) {  System.out.println( """+s+""" );}

    With MultiJava, you can use one or the other, depending on how you declare your object:

    Object o = new Object();show( o );  // uses show( Object )String s = "s";show( s );  // uses show( String )

    So how does symmetric multiple dispatch differ? Well, what if you declare your string argument as follows:

    Object o = "really a string";show( o );

    If you’re using regular method overloading, this will use show( Object ). If you’re using multiple dispatch, this uses show( String )?which is tricky. In the declaration above, the object you’re creating is a String object, but it’s being declared as an Object. This is valid, of course, since String is a subclass of Object (as are all objects in Java).

    But when you pass this object?which is really a string?to some method, which method do you use? Is it an Object or a String? If you use regular method overloading, then you make the choice based on the type of the variable, not the value. This is how overloading works in Java, as well as in C++ and many other languages with overloading. It’s kind of a subtle point, but suffice it to say that if you’ve done programming in a popular object-oriented language, this is what you’re used to.

    With symmetric multiple dispatch, the situation is reversed: the choice of method is made based on the type of the value, not the variable. Thus, the choice is necessarily made at run-time. More precisely, the argument type static@dynamic declares that the method is passed values of type dynamic contained in variables of type static. If a particular variable has no exact match for a particular value, then the closest-matching superclass is used, if any.

    It should be clear why this is useful. It lets you specify multiple implementations of a method, and the system will choose the right one based on the run-time value of the variable. This is particularly useful when subclassing, because it lets you add special custom behavior for a particular set of arguments and you don’t have to modify the superclass. (See the MultiJava Web site for more examples of the value of Symmetric Multiple Dispatch.)

    The term multiple in symmetric multiple dispatch comes from the fact that the value of the argument is used to execute, or dispatch, the method call. Normally, the object that owns the method is the only one that gets to decide (at run-time) which method to use. With multiple dispatch, the arguments are used as well.

    Open Classes
    Open classes are classes to which you can freely add new methods. Normally, if you want to add a method to a class, you have to modify the source code for that class. With open classes, you don’t. Adding a method to a class is as easy as this:

    public void SomeClass.method() {  // ...}

    This line adds a method called method to the class SomeClass. The method can be used just like a regular method:

    SomeClass sc = new SomeClass();sc.method();

    In regular Java, you could not do this without actually modifying the code for SomeClass. MultiJava makes it possible to add such methods after the fact. You can use this facility to improve the Show example from the previous section. Remember you created a method called show, which displays an object on System.out. Instead of using a separate method, why not add the method to the object itself so it knows how to show itself? You can do it like this:

    public void {  Show.getInstance().show( this );}

    Note that you’ve added a method to Object. This is very powerful. Since every class is a subclass of Object, adding a method to Object is really adding a method to every class in the system.

    Now, you can show your objects like this:

    Object o = new Object();;  // uses show( Object )String s = "s";;  // uses show( String )

    You’ll find the complete listing for in Listing 4. Click here for complete instructions for running the MultiJava variant.

    EPP: The Extensible Preprocessor
    The impressive EPP comes from the National Institute of Advanced Industrial Science and Technology in Japan. EPP takes the concept of a Java variant one step further. EPP isn’t a modified form of Java, it’s a toolkit for creating modified forms of Java. It allows you to create language extension plug-ins, which implement particular new features of the language. You can then pick and choose which new language features you want to use for any particular programming task.

    EPP works by translation?that is, new features are implemented by translating them into regular Java. Thus, if you create a plug-in that adds a new keyword, this plug-in must identify any code that uses that keyword and translate the code into regular Java.

    In addition to allowing you to create new language features, EPP also comes with a long list of example features. Enumerated types, assertions, associative arrays, macros, ifdefs, operator overloading, optional parameters, and multiple inheritance are just some of the interesting language features that have been successfully added to Java using EPP. (A complete discussion of EPP is beyond the scope of this article, but it does cover some of the highlights and provide a look at a working example.)

    By now you’ve gotten the point that EPP isn’t just a modified compiler like the other packages discussed in this article. In fact, EPP is an extensible compiler. It lets you intervene in the compilation process at a number of different points. These interventions are in the form of code that augments the code in the compiler.

    EPP uses the object-oriented idea of a mixin. A mixin is a small fragment of code that can be added, or mixed into, one or more other classes. The following is an example of a mixin:

    SystemMixin PRegex {  class Epp {    extend void initMacroTable() {      original();      defineMacro( :"=~", new ApplyPRegexMacro() );    }}

    The first line declares that this is a mixin called Pregex:

    SystemMixin PRegex {

    The next line declares that the mixin will be added to a class called Epp:

      class Epp {

    The extend keyword declares that a particular method should be overridden:

        extend void initMacroTable() {      // ...    }

    If you would like to call the already-existing code from your new code, you can do this using the original() method. The following definition of initMacroTable calls original before defining a new macro with defineMacro:

    extend void initMacroTable() {  original();  defineMacro( :"=~", new ApplyPRegexMacro() );}

    In some ways, this kind of programming is like programming by subclassing. The difference is that these mixins can be combined dynamically at run-time. Thus, you dont have to create a specific subclass containing all the modifications you want. You can decide at run-time which mixins to merge into your base class.

    Lexical Analysis
    If the feature you would like to add requires a deep syntactic change, then you can modify the lexical analyzer, which is the first step in the parsing process. You can do this by overriding one of the methods of the lexical analyzer, such as readOperator():

    extend Token readOperator(EppInputStream in) {  int begin = in.pointer();  char c = in.getc();  if (c=='/') {    String regex = "";    // ...}

    The default readOperator method carries out the low-level lexical analysis required to distinguish operators in the Java source code. By creating a modified form of this method, you can alter the lexical analyzer to recognize a new operator, for example.

    If you don’t know how to write a lexical analyzer routine for Java, don’t worry. The default lexical analyzer is already written for you. All you have to do is write code for your new features and use the original method to access the default implementation to handle all other cases.

    You can also modify the language at the parsing level after the lexical analysis is complete. The following is an example of this, taken from the EPP implementation of assertions:

    extend Tree statementTop(){  if (lookahead() == :assert){    matchAny();    Tree exp1 = expression();    if (lookahead() == :":"){      matchAny();      Tree exp2 = expression();      match(:";");      return new Tree(:assert, exp1, exp2);    } else {      match(:";");      return new Tree(:assert, exp1);    }  } else {    return original();  }}

    This mixin method modifies the statementTop() method, which is one of the pieces of EPPs Java parser. The code checks for the assert syntax and, if it finds it, returns a new Tree object, which is the class used to represent a parsed Java program. If it tries to parse something that isnt an assert statement, then it calls the original method to pass the buck to the regular Java parser, which will handle all other cases.

    EPP Uses EPP
    The source code for the EPP compiler actually uses EPP itself. That is, if you look at the source code for EPP, you’ll see syntactic constructs that you dont recognize as being regular Java. EPP actually has to process itself before it can be compiled. If this seems impossible, note that the original version of EPP was written not in Java but in Common Lisp.

    A Full Example of EPP Plug-in
    Take a look at a working example of an EPP plug-in. The example is called Pregex. It implements a small subset of Perl’s regular expression (regex) syntax:

    if (s =~ /^[A-Z][a-z]+/) {  // ...}

    This requires two syntactic constructs. First, you need the slash-quoted regex syntax:


    You implement this by overriding the readOperator() method in the lexical analyzer. You scan the incoming characters and if you find a pair of slash characters with something between them, you just treat that like a string. Of course, this isnt the full regex syntax by any stretch, but it serves to illustrate the technique.

    You also need to implement the =~ operator, which of course doesnt exist in Java. Override the initMacroTable()method to respond to the presence of =~ by invoking the ApplyPRegexMacro class. This class has a call method that translates the operator into a small fragment of Java code that calls the Pregex.checkMatch() method, which is defined in a helper class.

    As you can see, these techniques are extremely powerful. Since you are able to extend Java using Java itself, you can implement arbitrarily complex translations and transformations?limited only by your abilities as a programmer. EPP transforms Java into a truly programmable language.

    Listing 5 ( and Listing 6 ( show the full source for the PRegex example. Click here for complete instructions for running the EPP variant.

    Extending Java’s Conservative Design
    The Java language has benefited from an extremely conservative design. In the interest of creating a powerful language with simple and predictable semantics, the original designers avoided a lot of the more controversial features that can be found in existing languages.

    A number of projects extend Java in interesting ways, and this article has explored some of them. Of particular interest are those systems that are extensible, such as EPP. Such systems allow you to try out new ideas in language design without having to face the onerous burden of implementing a full compiler.


    Share the Post: