The JNI Interface
As described in the previous section,
JNativeHash.java contains the declarations of the native code, which is itself implemented in
JNativeHash.cc. Each function in this file is called by the JNI library and in turn must call code in
NativeHash. Thus, the code in
JNativeHash contains a series of stub functions that extract C++ values from Java values and pass them to C++ methods. If there is a return value, this value may need to be stuffed into a Java data structure before being returned.
Here is a typical function.
JNIEXPORT jstring JNICALL Java_JNativeHash_get_1u
(JNIEnv *env, jobject je, jstring s) {
try {
jboolean b;
const char *cfn = env->GetStringUTFChars( s, &b );
NativeHash *e = getNativeHash( env, je );
const char *value = e->get( cfn );
jstring ret = env->NewStringUTF( value );
env->ReleaseStringUTFChars( s, cfn );
return ret;
} catch( Err *e ) {
ERR(env,e);
}
}
The name of this function is
Java_JNativeHash_get_1u, which is a terrible name generated by the 'javah' tool that comes with the JDK. The arguments to this function are Java data structures, as seen from the C/C++ side.
First, we must turn the Java string into a C/C++ string, effectively 'locking' the Java string:
const char *cfn = env->GetStringUTFChars( s, &b );
Then, we grab our
NativeHash data structure. This is stored in a Java field called '
eptr.' On the Java side, this field is actually an
int, but that doesn't matter, because what it really contains is a pointer to a
NativeHash. The helper function
getNativeHash() extracts the
NativeHash pointer and returns it:
NativeHash *e = getNativeHash( env, je );
Then, we actually *use* the
NativeHash object and call its
get() method:
const char *value = e->get( cfn );
Before we can return this value, we must turn it into a Java string:
jstring ret = env->NewStringUTF( value );
Finally, before returning, we must unlock the Java string that we locked at the start of the method:
env->ReleaseStringUTFChars( s, cfn );
Then, we can return the value:
return ret;
You'll notice that all of this code is wrapped in a
try .. catch block. This is because we want to be able to catch C++ exceptions and turn them into Java exceptions. See the "Exception Handling" section for more details.
Iteration
All collection classes in the standard Java libraries support iteration, so you should support it as well. From the Java end, use the standard technique of creating an inner class that implements the Iterator interface:
public class JNativeHash {
...
protected class JNativeHashIterator implements Iterator
{
...
}
...
}
You might consider creating a C++ class called NativeHashIterator and using JNI to call it from JNativeHashIterator, but there is a problem with this approach: NativeHashIterator would likely contain an STL iterator object, which would in turn refer to the
NativeHash object. If the user code were to mistakenly dispose of the
NativeHash but keep using the NativeHashIterator, the program could crash.
This problem is a regular occurrence when calling C/C++ from Java using JNI. Java is garbage-collected, while C/C++ is not. Java code relies on the fact that you can never have a pointer to an object that isn't there, while C/C++ code must be explicitly written to avoid dangling pointers. This can be tricky, especially when the code is called from a garbage-collected language.
The best solutionwhen possibleis to avoid using pointers between your JNI objects. This way, you don't have to deal with the possibility of a dangling pointer.
Thus, you do not want to use STL iterators. Or rather, you do not want to keep them around long enough for them to have dangling pointers. Here is how it works.
NativeHash has three methods:
const char *firstKey();
const char *nextKey( const char *key );
firstKey() returns the first key in the hashtable. In fact, it uses an iterator to do thisit creates the iterator, gets the first key, and then disposes of the iterator:
const char *NativeHash::firstKey() {
map<string,string>::iterator it = ssmap.begin();
if (it == ssmap.end()) {
return 0;
} else {
return it->first.c_str();
}
}
The
nextKey() method takes the previous key as an argument. It uses an iterator to find the previous key, advances to the next key, and returns it. Like
firstKey(), it only keeps the iterator around long enough to do the job and deletes it before returning:
const char *NativeHash::nextKey( const char *key ) {
map<string,string>::iterator it = ssmap.find( key );
if (it == ssmap.end()) {
return 0;
} else {
it++;
if (it == ssmap.end()) {
return 0;
} else {
return it->first.c_str();
}
}
}
Thus, the code doesn't keep STL iterators around long enough for them to have dangling pointers.
On the Java side, we must implement the following three methods:
protected class JNativeHashIterator implements Iterator {
...
public boolean hasNext() { ... }
public Object next() { ... }
public void remove() { ... }
...
}
This class maintains a variable called '
current', which contains the last string obtained from the native code. Each time the iterator is called, it uses this string to get the next string. In this way, the iterator's stateits
cursoris contained in this string, rather than being contained in an STL iterator. This way, we don't have to worry about dangling pointers.
Exception Handling
There are times when something goes wrong in the native code. In our implementation, there are two such situationsa null key or an I/O problem. In each case, we want to propagate the error to the Java code, rather than simply exiting or, even worse, crashing.
Luckily, there is enough similarity between C++ exceptions and Java exceptions to make this possible. When an error occurs in the C++ code, it calls a function called
nh_error(). Here's an example from the I/O code in
NativeHash.cc:
int r = fread( ptr, 1, size, stream );
if (r != size) {
nh_error( "Could only read %d of %d bytes\n", r, size );
}
Like
printf(),
nh_error() is a
variadic function, which means it can take a variable number of arguments. As per tradition, this function calls
vsnprintf() to print the string (and other arguments) to a memory buffer. But instead of printing this buffer out, it turns it into an exception and throws it:
vsnprintf( buffer, BUFFER_SIZE, f, va );
throw new Err( type, strdup( buffer ) );
This C++ exception is then caught in
JNativeHash.cc, in the
read method:
JNIEXPORT void JNICALL Java_JNativeHash_read1_1u
(JNIEnv *env, jobject je, jstring filename) {
try {
...
} catch( Err *e ) {
ERRV(env,e);
}
}
The
ERRV() macro just calls a function called
error(). (For typechecking purposes, there are two variants of this macro:
ERR() for stub functions that return a value and
ERRV(), for stub functions that do not.)
The
error() function, in turn, instantiates an exception object and throws it back to the Java code, using the JNI
throwNew() method:
env->ThrowNew( jpc, err->message );
This causes the program to return from the native call entirely and to throw the Java exception up the Java call stack.
Memory Management
As alluded to earlier, it is sometimes prudent to call
JNativeHash.dispose() before garbage collection. Of course, the
finalize() method calls
dispose(), to free up the native data structures, but sometimes it is better not to wait until garbage collection.
In particular, this happens when you have a lot of native data, and not very much Java data. This situation occurred during the development of a database application called FSS, which used an enormous amount of data stored in native data structures.
In FSS, it was often the case that the Java heap would be only a few megs, while the native heap was getting to be around a gigabyte. I wondered, at first, why the garbage collector wasn't running, and then I realized that the garbage collector wasn't counting the native memory. Naturally, the Java Virtual Machine (JVM) has no idea how much memory you have allocated in your native code, and so it does not know that it needs to garbage collect.
Even calling
System.gc() won't force it to collect unused data structures if the JVM doesn't know that memory is low. Thus, I was forced to establish a simple reference counting system, so that I would know when a
JNativeHash was no longer in use. At that point, I would call its
dispose() method, freeing up the memory used by native data structures. Later, presumably, the JVM garbage collector would take care of the Java portion of the
JNativeHash.
Efficiency
You can test the time and space efficiency of
JNativeHash by running
JNHTest and
RegularTest, and seeing how much space and time they use. These programs allocate a lot of memory, so you should set the maximum heap size very high:
% time java -Xmx500000000 RegularTest
% time java -Xmx500000000 JNHTest
Informal testing on my desktop showed an almost 20 percent speedup when using
JNHTest.
JNHTest only required 78MB of RAM, as compared with 190MB for
RegularTesta savings of 59 percent. This space savings is critical to FSS, which uses an enormous amount of memory.
Further Ideas
A number of issues were ignored in this article, and are worth looking into further.
For example, the I/O interface to
JNativeHash only deals with individual files; it completely bypasses the Java I/O package. A better implementation would provide methods to read and write using Java streams.
JNativeHash also completely ignores the collection classes in the
java.util pacakge. This is strange, especially since it would seem that
JNativeHash ought to implement the
java.util.Map interface.
The problem with implementing Map is that the get and put methods take objects, not strings. Maps in general need to be able to handle objects of any type, while this implementation of
JNativeHash only deals with strings.
One solution might be to implement Map, but to throw an exception if anything other than a string is passed in. This would work, but it violates the contract implied by Map, and so is technically incorrect. Keeping it as a separate class is the best indication of what it can and cannot do. This isn't a good solution in general, but it was originally developed to replace a single data structure in a memory-intensive application, where it easily replaced the original, less-efficient Java data structure.
This article has demonstrated how to use the JNI to create and access a data structure implemented in native code. Using native code results in a substantial savings in both time and space.
This article includes
full source code as well as a makefile to build the native code. It will work on any recent GCC installation.