devxlogo

Write Efficient Java Apps Using Native Data Structures with JNI

Write Efficient Java Apps Using Native Data Structures with JNI

he Java Native Interface (JNI) is used to call code written in another language?such as C or C++?from a Java program. As such, it is often used when a Java program needs to call pre-existing code in another language or to call code that could be written in Java but needs to be written in C/C++ for some reason. Data structures implemented in C or C++ can often be faster?and use less space?than the equivalent data structures in Java.

This article investigates the use of the JNI for accessing a data structure implemented in C++. The data structure I’ll be implementing is a hashtable called JNativeHash. It’s implementation is based on the map class from the Standard Template Libraries (STL). Unlike Java, C++ does not emphasize the use of run-time type identification, so it’s not as easy to create a hashtable that maps arbitrary objects to arbitrary objects. Instead, I’ll be creating a hashtable that maps strings to strings.

What You Need
? A recent JDK from Sun Microsystems, or a compatible Java development kit.
? An Integrated Development Environment (IDE) for Java or a suitable command shell.

The Test Program
To test the code I’ll simply load up the data structure with a lot of data and see how it performs. The test program simply loads the hashtable with 1,000,000 entries, like this:

JNativeHash jnh = JNativeHash.create();for (int i=0; i<1000000; ++i) {  jnh.put( "jnh"+i, i+"jnh" );}

This code maps the string "jnh0" to the string "0jnh", "jnh1" to "1jnh", and so on.

There are two versions of the test program?JNHTest, which uses the native hashtable, and RegularTest, which uses a regular Java hashtable. These programs can be used to compare the space and time efficiency of the data structures.

In the next section, I'll review the overall structure of the code.

Code Structure
The code consists of the following source files:

  • NativeHash.cc ( Listing 1): implements the hashtable, using STL
  • NativeHash.h ( Listing 2): header file for NativeHash.cc
  • JNativeHash.cc ( Listing 3): JNI interface to NativeHash.cc
  • JNativeHash.h: header file for JNativeHash.cc?this is generated by the 'javah' tool, so it's not included here.
  • JNativeHash.java ( Listing 4): Java interface to NativeHash.cc
  • JNHTest.java ( Listing 5): test program for JNativeHash
  • RegularTest.java ( Listing 6): test program from regular Java hashtable
  • Err.cc ( Listing 7): support for C++ exceptions
  • Err.h ( Listing 8): support for C++ exceptions

The first two files, NativeHash.cc and NativeHash.h, implement the hashtable data structuring using STL. This code is entirely independent from Java and could be used in a C++-only program.

The next two files, JNativeHash.cc and JNativeHash.h, implement the JNI interface to NativeHash.cc. The code is relatively straightforward: Each method in NativeHash has a corresponding method in JNativeHash. The methods in JNativeHash do little more than gather the arguments from the Java data structures and pass them on to the C++ code.

JNativeHash.java is the Java interface to NativeHash. This is the only file that a user of JNativeHash really has to know about. It contains all the appropriate methods for storing and retrieving hashtable values. Many of its methods are marked 'native'; these methods are implemented in JNativeHash.cc.

The next two files, JNHTest.java and RegularTest.java, are used to test the space and time efficiency of the code, as described above.

Finally, the last two files, Err.cc and Err.h, contain support code to allow the native C++ code to throw exceptions to the Java code that called it.

Let's take a look at the implementation of the core data structure.

The Native Code
The core data structure is a hashtable implemented in C++, in the files NativeHash.cc and NativeHash.h. In fact, this code uses STL to implement the actual data structure; the NativeHash class is really just a wrapper around this data structure.

It is beyond the scope of this article to go into STL in detail, but here's how you implement the hashtable:

#include  include ...class NativeHash { protected:  map ssmap; ...}

'ssmap' is declared to be a map from string to string. This C++ is easy to use. To store a value, you simply assign it like this:

ssmap["key"] = "value";

To retrieve a value, access it like this:

string value = ssmap["key"];

This way, it's easy to create a wrapper class around this data structure, which you need to do because the interface to your native code must mimic the interface to your Java code.

In particular, there are methods to get and set a value:

const char *get( const char *key );void put( const char *key, const char *value ); 

There are also auxiliary methods to check for key existence and check the number of entries:

int size() const;int contains( const char *s ) const; 

There are two methods used to implement iteration:

const char *firstKey();const char *nextKey( const char *key ); 

There are methods to read from and write to a file:

void write( const char *filename );static NativeHash *read( const char *filename ); 

Your class includes these methods because reading and writing are so much faster when they are done directly from native code.

Finally, there is a dispose() method:

static void dispose( NativeHash *e ); 

This is used by the finalize method of the JNativeHash class to deallocate the NativeHash object. It is also necessary that the user be able to call this directly, rather than having it called during garbage collection; see the "Memory Management" section for details.

The Java Wrapper
JNativeHash.java contains the code that the client sees. Many of the methods are Java versions of the native methods described in the previous section:

public String get( String key );public void put( String key, String value );public int size();public boolean contains( String key );public void write( String filename );static public JNativeHash read( String filename ) throws IOException; 

These methods, in fact, come in pairs. For example, the get() method has a twin, called get_u():

public String get( String key ) {  synchronized( lock ) {    return get_u( key );  }}native public String get_u( String key ); 

The get() method simply calls the get_u() method, but it does it inside a synchronized block. The get_u() method is a native method, so its implementation is omitted. Rather than having a Java implementation in JNativeHash.java, it has a C++ implementation in JNativeHash.cc.

This doubling isn't strictly necessary?we could have simply made the get() method do the work of both:

synchronized native public String get( String key ); 

However, it is better to be explicit about the locking mechanism, because this can permit more fine-grained parallelism. Likewise, by using a static lock, we are locking all hashtables independently, which may be safer in the event that the native code is not thread-safe. For example, if two objects are called at the same time, there may be a race condition deep within a non-thread-safe memory allocator. A static lock is safest, but if the programmer knows that the underlying native libraries are thread-safe, then she can simply make the lock non-static:

/*static*/ public Object lock = new Object();

The JNI Interface
As described in the previous section, JNativeHash.java contains the declarations of the native code, which is itself implemented in JNativeHash.cc. Each function in this file is called by the JNI library and in turn must call code in NativeHash. Thus, the code in JNativeHash contains a series of stub functions that extract C++ values from Java values and pass them to C++ methods. If there is a return value, this value may need to be stuffed into a Java data structure before being returned.

Here is a typical function.

JNIEXPORT jstring JNICALL Java_JNativeHash_get_1u  (JNIEnv *env, jobject je, jstring s) {  try {    jboolean b;    const char *cfn = env->GetStringUTFChars( s, &b );    NativeHash *e = getNativeHash( env, je );    const char *value = e->get( cfn );    jstring ret = env->NewStringUTF( value );    env->ReleaseStringUTFChars( s, cfn );    return ret;  } catch( Err *e ) {    ERR(env,e);  }}

The name of this function is Java_JNativeHash_get_1u, which is a terrible name generated by the 'javah' tool that comes with the JDK. The arguments to this function are Java data structures, as seen from the C/C++ side.

First, we must turn the Java string into a C/C++ string, effectively 'locking' the Java string:

const char *cfn = env->GetStringUTFChars( s, &b ); 

Then, we grab our NativeHash data structure. This is stored in a Java field called 'eptr.' On the Java side, this field is actually an int, but that doesn't matter, because what it really contains is a pointer to a NativeHash. The helper function getNativeHash() extracts the NativeHash pointer and returns it:

NativeHash *e = getNativeHash( env, je ); 

Then, we actually *use* the NativeHash object and call its get() method:

const char *value = e->get( cfn ); 

Before we can return this value, we must turn it into a Java string:

jstring ret = env->NewStringUTF( value ); 

Finally, before returning, we must unlock the Java string that we locked at the start of the method:

env->ReleaseStringUTFChars( s, cfn ); 

Then, we can return the value:

return ret; 

You'll notice that all of this code is wrapped in a try .. catch block. This is because we want to be able to catch C++ exceptions and turn them into Java exceptions. See the "Exception Handling" section for more details.

Iteration
All collection classes in the standard Java libraries support iteration, so you should support it as well. From the Java end, use the standard technique of creating an inner class that implements the Iterator interface:

public class JNativeHash {  ...  protected class JNativeHashIterator implements Iterator  {    ...  }  ...}

You might consider creating a C++ class called NativeHashIterator and using JNI to call it from JNativeHashIterator, but there is a problem with this approach: NativeHashIterator would likely contain an STL iterator object, which would in turn refer to the NativeHash object. If the user code were to mistakenly dispose of the NativeHash but keep using the NativeHashIterator, the program could crash.

This problem is a regular occurrence when calling C/C++ from Java using JNI. Java is garbage-collected, while C/C++ is not. Java code relies on the fact that you can never have a pointer to an object that isn't there, while C/C++ code must be explicitly written to avoid dangling pointers. This can be tricky, especially when the code is called from a garbage-collected language.

The best solution?when possible?is to avoid using pointers between your JNI objects. This way, you don't have to deal with the possibility of a dangling pointer.

Thus, you do not want to use STL iterators. Or rather, you do not want to keep them around long enough for them to have dangling pointers. Here is how it works.

NativeHash has three methods:

const char *firstKey();const char *nextKey( const char *key ); 

firstKey() returns the first key in the hashtable. In fact, it uses an iterator to do this?it creates the iterator, gets the first key, and then disposes of the iterator:

const char *NativeHash::firstKey() {  map::iterator it = ssmap.begin();  if (it == ssmap.end()) {    return 0;  } else {    return it->first.c_str();  }}

The nextKey() method takes the previous key as an argument. It uses an iterator to find the previous key, advances to the next key, and returns it. Like firstKey(), it only keeps the iterator around long enough to do the job and deletes it before returning:

const char *NativeHash::nextKey( const char *key ) {  map::iterator it = ssmap.find( key );  if (it == ssmap.end()) {    return 0;  } else {    it++;    if (it == ssmap.end()) {      return 0;    } else {      return it->first.c_str();    }  }}

Thus, the code doesn't keep STL iterators around long enough for them to have dangling pointers.

On the Java side, we must implement the following three methods:

protected class JNativeHashIterator implements Iterator {  ...  public boolean hasNext() { ... }  public Object next() { ... }  public void remove() { ... }  ...}

This class maintains a variable called 'current', which contains the last string obtained from the native code. Each time the iterator is called, it uses this string to get the next string. In this way, the iterator's state?its cursor?is contained in this string, rather than being contained in an STL iterator. This way, we don't have to worry about dangling pointers.

Exception Handling
There are times when something goes wrong in the native code. In our implementation, there are two such situations?a null key or an I/O problem. In each case, we want to propagate the error to the Java code, rather than simply exiting or, even worse, crashing.

Luckily, there is enough similarity between C++ exceptions and Java exceptions to make this possible. When an error occurs in the C++ code, it calls a function called nh_error(). Here's an example from the I/O code in NativeHash.cc:

int r = fread( ptr, 1, size, stream );if (r != size) {  nh_error( "Could only read %d of %d bytes
", r, size ); }

Like printf(), nh_error() is a variadic function, which means it can take a variable number of arguments. As per tradition, this function calls vsnprintf() to print the string (and other arguments) to a memory buffer. But instead of printing this buffer out, it turns it into an exception and throws it:

vsnprintf( buffer, BUFFER_SIZE, f, va );throw new Err( type, strdup( buffer ) ); 

This C++ exception is then caught in JNativeHash.cc, in the read method:

JNIEXPORT void JNICALL Java_JNativeHash_read1_1u  (JNIEnv *env, jobject je, jstring filename) {  try {    ...  } catch( Err *e ) {    ERRV(env,e);  }}

The ERRV() macro just calls a function called error(). (For typechecking purposes, there are two variants of this macro: ERR() for stub functions that return a value and ERRV(), for stub functions that do not.)

The error() function, in turn, instantiates an exception object and throws it back to the Java code, using the JNI throwNew() method:

env->ThrowNew( jpc, err->message ); 

This causes the program to return from the native call entirely and to throw the Java exception up the Java call stack.

Memory Management
As alluded to earlier, it is sometimes prudent to call JNativeHash.dispose() before garbage collection. Of course, the finalize() method calls dispose(), to free up the native data structures, but sometimes it is better not to wait until garbage collection.

In particular, this happens when you have a lot of native data, and not very much Java data. This situation occurred during the development of a database application called FSS, which used an enormous amount of data stored in native data structures.

In FSS, it was often the case that the Java heap would be only a few megs, while the native heap was getting to be around a gigabyte. I wondered, at first, why the garbage collector wasn't running, and then I realized that the garbage collector wasn't counting the native memory. Naturally, the Java Virtual Machine (JVM) has no idea how much memory you have allocated in your native code, and so it does not know that it needs to garbage collect.

Even calling System.gc() won't force it to collect unused data structures if the JVM doesn't know that memory is low. Thus, I was forced to establish a simple reference counting system, so that I would know when a JNativeHash was no longer in use. At that point, I would call its dispose() method, freeing up the memory used by native data structures. Later, presumably, the JVM garbage collector would take care of the Java portion of the JNativeHash.

Efficiency
You can test the time and space efficiency of JNativeHash by running JNHTest and RegularTest, and seeing how much space and time they use. These programs allocate a lot of memory, so you should set the maximum heap size very high:

% time java -Xmx500000000 RegularTest% time java -Xmx500000000 JNHTest

Informal testing on my desktop showed an almost 20 percent speedup when using JNHTest. JNHTest only required 78MB of RAM, as compared with 190MB for RegularTest?a savings of 59 percent. This space savings is critical to FSS, which uses an enormous amount of memory.

Further Ideas
A number of issues were ignored in this article, and are worth looking into further.

For example, the I/O interface to JNativeHash only deals with individual files; it completely bypasses the Java I/O package. A better implementation would provide methods to read and write using Java streams.

JNativeHash also completely ignores the collection classes in the java.util pacakge. This is strange, especially since it would seem that JNativeHash ought to implement the java.util.Map interface.

The problem with implementing Map is that the get and put methods take objects, not strings. Maps in general need to be able to handle objects of any type, while this implementation of JNativeHash only deals with strings.

One solution might be to implement Map, but to throw an exception if anything other than a string is passed in. This would work, but it violates the contract implied by Map, and so is technically incorrect. Keeping it as a separate class is the best indication of what it can and cannot do. This isn't a good solution in general, but it was originally developed to replace a single data structure in a memory-intensive application, where it easily replaced the original, less-efficient Java data structure.

This article has demonstrated how to use the JNI to create and access a data structure implemented in native code. Using native code results in a substantial savings in both time and space.

This article includes full source code as well as a makefile to build the native code. It will work on any recent GCC installation.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist