Browse DevX
Sign up for e-mail newsletters from DevX


Write Efficient Java Apps Using Native Data Structures with JNI : Page 3

Sometimes Java's data structures use too much memory to store the data you need to store. In such situations, you can use the JNI native code interface to access native data structures. Find out how to use the STL in C++ to implement a space-efficient hashtable that works like a regular Java hashtable.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

The JNI Interface
As described in the previous section, JNativeHash.java contains the declarations of the native code, which is itself implemented in JNativeHash.cc. Each function in this file is called by the JNI library and in turn must call code in NativeHash. Thus, the code in JNativeHash contains a series of stub functions that extract C++ values from Java values and pass them to C++ methods. If there is a return value, this value may need to be stuffed into a Java data structure before being returned.

Here is a typical function.

JNIEXPORT jstring JNICALL Java_JNativeHash_get_1u (JNIEnv *env, jobject je, jstring s) { try { jboolean b; const char *cfn = env->GetStringUTFChars( s, &b ); NativeHash *e = getNativeHash( env, je ); const char *value = e->get( cfn ); jstring ret = env->NewStringUTF( value ); env->ReleaseStringUTFChars( s, cfn ); return ret; } catch( Err *e ) { ERR(env,e); } }

The name of this function is Java_JNativeHash_get_1u, which is a terrible name generated by the 'javah' tool that comes with the JDK. The arguments to this function are Java data structures, as seen from the C/C++ side.

First, we must turn the Java string into a C/C++ string, effectively 'locking' the Java string:

const char *cfn = env->GetStringUTFChars( s, &b );

Then, we grab our NativeHash data structure. This is stored in a Java field called 'eptr.' On the Java side, this field is actually an int, but that doesn't matter, because what it really contains is a pointer to a NativeHash. The helper function getNativeHash() extracts the NativeHash pointer and returns it:

NativeHash *e = getNativeHash( env, je );

Then, we actually *use* the NativeHash object and call its get() method:

const char *value = e->get( cfn );

Before we can return this value, we must turn it into a Java string:

jstring ret = env->NewStringUTF( value );

Finally, before returning, we must unlock the Java string that we locked at the start of the method:

env->ReleaseStringUTFChars( s, cfn );

Then, we can return the value:

return ret;

You'll notice that all of this code is wrapped in a try .. catch block. This is because we want to be able to catch C++ exceptions and turn them into Java exceptions. See the "Exception Handling" section for more details.

All collection classes in the standard Java libraries support iteration, so you should support it as well. From the Java end, use the standard technique of creating an inner class that implements the Iterator interface:

public class JNativeHash { ... protected class JNativeHashIterator implements Iterator { ... } ... }

You might consider creating a C++ class called NativeHashIterator and using JNI to call it from JNativeHashIterator, but there is a problem with this approach: NativeHashIterator would likely contain an STL iterator object, which would in turn refer to the NativeHash object. If the user code were to mistakenly dispose of the NativeHash but keep using the NativeHashIterator, the program could crash.

This problem is a regular occurrence when calling C/C++ from Java using JNI. Java is garbage-collected, while C/C++ is not. Java code relies on the fact that you can never have a pointer to an object that isn't there, while C/C++ code must be explicitly written to avoid dangling pointers. This can be tricky, especially when the code is called from a garbage-collected language.

The best solution—when possible—is to avoid using pointers between your JNI objects. This way, you don't have to deal with the possibility of a dangling pointer.

Thus, you do not want to use STL iterators. Or rather, you do not want to keep them around long enough for them to have dangling pointers. Here is how it works.

NativeHash has three methods:

const char *firstKey(); const char *nextKey( const char *key );

firstKey() returns the first key in the hashtable. In fact, it uses an iterator to do this—it creates the iterator, gets the first key, and then disposes of the iterator:

const char *NativeHash::firstKey() { map<string,string>::iterator it = ssmap.begin(); if (it == ssmap.end()) { return 0; } else { return it->first.c_str(); } }

The nextKey() method takes the previous key as an argument. It uses an iterator to find the previous key, advances to the next key, and returns it. Like firstKey(), it only keeps the iterator around long enough to do the job and deletes it before returning:

const char *NativeHash::nextKey( const char *key ) { map<string,string>::iterator it = ssmap.find( key ); if (it == ssmap.end()) { return 0; } else { it++; if (it == ssmap.end()) { return 0; } else { return it->first.c_str(); } } }

Thus, the code doesn't keep STL iterators around long enough for them to have dangling pointers.

On the Java side, we must implement the following three methods:

protected class JNativeHashIterator implements Iterator { ... public boolean hasNext() { ... } public Object next() { ... } public void remove() { ... } ... }

This class maintains a variable called 'current', which contains the last string obtained from the native code. Each time the iterator is called, it uses this string to get the next string. In this way, the iterator's state—its cursor—is contained in this string, rather than being contained in an STL iterator. This way, we don't have to worry about dangling pointers.

Exception Handling
There are times when something goes wrong in the native code. In our implementation, there are two such situations—a null key or an I/O problem. In each case, we want to propagate the error to the Java code, rather than simply exiting or, even worse, crashing.

Luckily, there is enough similarity between C++ exceptions and Java exceptions to make this possible. When an error occurs in the C++ code, it calls a function called nh_error(). Here's an example from the I/O code in NativeHash.cc:

int r = fread( ptr, 1, size, stream ); if (r != size) { nh_error( "Could only read %d of %d bytes\n", r, size ); }

Like printf(), nh_error() is a variadic function, which means it can take a variable number of arguments. As per tradition, this function calls vsnprintf() to print the string (and other arguments) to a memory buffer. But instead of printing this buffer out, it turns it into an exception and throws it:

vsnprintf( buffer, BUFFER_SIZE, f, va ); throw new Err( type, strdup( buffer ) );

This C++ exception is then caught in JNativeHash.cc, in the read method:

JNIEXPORT void JNICALL Java_JNativeHash_read1_1u (JNIEnv *env, jobject je, jstring filename) { try { ... } catch( Err *e ) { ERRV(env,e); } }

The ERRV() macro just calls a function called error(). (For typechecking purposes, there are two variants of this macro: ERR() for stub functions that return a value and ERRV(), for stub functions that do not.)

The error() function, in turn, instantiates an exception object and throws it back to the Java code, using the JNI throwNew() method:

env->ThrowNew( jpc, err->message );

This causes the program to return from the native call entirely and to throw the Java exception up the Java call stack.

Memory Management
As alluded to earlier, it is sometimes prudent to call JNativeHash.dispose() before garbage collection. Of course, the finalize() method calls dispose(), to free up the native data structures, but sometimes it is better not to wait until garbage collection.

In particular, this happens when you have a lot of native data, and not very much Java data. This situation occurred during the development of a database application called FSS, which used an enormous amount of data stored in native data structures.

In FSS, it was often the case that the Java heap would be only a few megs, while the native heap was getting to be around a gigabyte. I wondered, at first, why the garbage collector wasn't running, and then I realized that the garbage collector wasn't counting the native memory. Naturally, the Java Virtual Machine (JVM) has no idea how much memory you have allocated in your native code, and so it does not know that it needs to garbage collect.

Even calling System.gc() won't force it to collect unused data structures if the JVM doesn't know that memory is low. Thus, I was forced to establish a simple reference counting system, so that I would know when a JNativeHash was no longer in use. At that point, I would call its dispose() method, freeing up the memory used by native data structures. Later, presumably, the JVM garbage collector would take care of the Java portion of the JNativeHash.

You can test the time and space efficiency of JNativeHash by running JNHTest and RegularTest, and seeing how much space and time they use. These programs allocate a lot of memory, so you should set the maximum heap size very high:

% time java -Xmx500000000 RegularTest % time java -Xmx500000000 JNHTest

Informal testing on my desktop showed an almost 20 percent speedup when using JNHTest. JNHTest only required 78MB of RAM, as compared with 190MB for RegularTest—a savings of 59 percent. This space savings is critical to FSS, which uses an enormous amount of memory.

Further Ideas
A number of issues were ignored in this article, and are worth looking into further.

For example, the I/O interface to JNativeHash only deals with individual files; it completely bypasses the Java I/O package. A better implementation would provide methods to read and write using Java streams.

JNativeHash also completely ignores the collection classes in the java.util pacakge. This is strange, especially since it would seem that JNativeHash ought to implement the java.util.Map interface.

The problem with implementing Map is that the get and put methods take objects, not strings. Maps in general need to be able to handle objects of any type, while this implementation of JNativeHash only deals with strings.

One solution might be to implement Map, but to throw an exception if anything other than a string is passed in. This would work, but it violates the contract implied by Map, and so is technically incorrect. Keeping it as a separate class is the best indication of what it can and cannot do. This isn't a good solution in general, but it was originally developed to replace a single data structure in a memory-intensive application, where it easily replaced the original, less-efficient Java data structure.

This article has demonstrated how to use the JNI to create and access a data structure implemented in native code. Using native code results in a substantial savings in both time and space.

This article includes full source code as well as a makefile to build the native code. It will work on any recent GCC installation.

Greg Travis is a Java programmer and technology writer, living in New York City. After spending three years in the world of high-end PC games, he joined EarthWeb, where he developed new technologies with the then-new Java programming language. Since 1997, he has been a consultant in a variety of Web technologies.
Thanks for your registration, follow us on our social networks to keep up-to-date