s the number of libraries and APIs proliferate, the number of jar files on a developer’s system and on application servers increases exponentially. If you don’t manage your system classpath and any additional classes or jar files that your application loads at runtime, insidious, hard-to-identify bugs may arise. Jar files that have the same name but contain different packaging, or identically named classes that contain methods with different signatures can leave you wondering why an application that once worked suddenly fails.
Knowing what’s on your system’s classpath and understanding Java’s classloading mechanism are essential to building reliable applications. Although you could use the Java command line jar utility to display the contents of any of your jar files, typing the command jar tvf [filename] on each and every jar file on your system would be an extremely tedious and error-prone process. This article provides a simple, more efficient utility that enables you to inventory jar files and identify duplicates of both the jar files themselves and the classes packaged within them.
This tutorial discusses how to use this utility, or “auditor” as I refer to it, which you run from the command line. It displays results in a GUI interface (as shown in Figure 1), or saves results as an XML file (as shown in Internet Explorer in Figure 2).
Figure 1: Jar Auditor GUI Display |
Figure 2: Jar Auditor XML Output |
Undetected Redundancy in Jar Files
Before addressing how to implement the auditor, let’s first clarify some classpath issues that can negatively impact your code. The classpath is an environmental variable that tells the Java Virtual Machine (JVM) where to find the resources (.class files, .jar files, and certain properties files such as jndi.properties) that it needs to load. Unfortunately, the classpath variable’s settings can’t warn you when you reference two jar files with the same name that might contain different versions of the same classes. Nor can it tell you whether two jars with different names on your classpath contain identically named classes. Even if the jar files are in different packages this opportunity for redundancy can be a problem, because the JVM will load only one version of the class and you won’t know which one has loaded until you encounter a problem and start to debug.
Unfortunately, Java has no mechanism for identifying this problem or reliably tracking versioning in libraries. You could modify the source code to monitor the classloading process so you can get some idea of what is loading, but this solution is possible only if you control the source code. Alternatively, the Java Security Manager mechanism can help avoid some naming clashes by limiting the code bases that can execute in any given application. Unfortunately, many applications are delivered with wide open file permissions and no security manager set.
Dangerous Misconceptions About Classloading
When the JVM boots up at runtime, the bootstrap classloader (also sometimes called the primordial classloader) loads Java’s core API (rt.jar). It then calls the “extension” classloader, which loads all the classes that are in the /jre/lib/ext directory of your Java installation. After the extensions directory loads, the system classloader loads any classes on your system’s classpath, as specified in your individual personal profile or the OS environmental classpath variable for the system.
As the developer, you control the only other two ways in which additional classes can be loaded. One means of control is using the command line ‘? classpath’ argument to specify additional jar files or directories that the program should consult for class resources. The other way is explicitly instantiating a classloader in your application and specifying the URL to the resources you wish to load. (I mention this last option only for completeness. It is outside the scope of this article. Consult the Related Resources section of this article for further reading.)
Java’s classloading mechanism works on a delegation model. In practice, this means that as each classloader prepares to load a class it checks whether a class of that particular name was loaded by any of its parent classloaders. If the class was loaded by a parent classloader then the currently active classloader uses the already loaded class. Otherwise, it loads the requested class for the first time. Developers with an incomplete understanding of this mechanism erroneously conclude that putting classes higher in the classloader delegation hierarchy is a sound practice, because they can be sure that a particular class will always load in a particular order.
This practice isn’t good at all, however. Taken to its logical conclusion, it suggests that you should put all classes in the /ext directory, but this only moves the location of the version problem. Moreover, by throwing everything into the /ext directory you undermine the granularity of the Java security mechanism, because the /ext directory typically has “all permissions” set in the java.policy file.Don’t Assume. Audit!
Unlike an IRS audit, a classpath audit will actually profit you. By applying the auditor utility, you’ll avoid lots of hard-to-discover class conflicts. The auditor is a command line utility that you invoke using the following commands:
GUI - java ? jar jar_auditor-2.0.jar gui [root dir to audit ? c:/audittest]XML - java ? jar jar_auditor-2.0.jar xml [root dir to audit ? c:/audittest] [XML file output path]
The auditor recursively walks the directory tree that you specified in the second argument and filters out jars for auditing. It doesn’t audit class files that aren’t packaged in jar files because you could easily find an individual class file by executing a brute force search of the file system. Without a utility like the auditor, you’d have no simple way to interrogate jar files.
When the command line ‘gui’ argument specifies output to a GUI (see Listing 1), the constructor of the GUI builds the tree node, which will display in a split pane view. The createNodes()
method invokes a recursive walk of the specified directory tree, filtering out the jar files for auditing.
As it walks the directory tree (see Listing 2), the auditor makes a callback to a WalkObserverImpl object, which essentially is the link between the walk of the file system and the work of the AuditMaster class that audits and inventories the jar files it encounters.
The call to the observer passes the file being evaluated. The observer discerns whether the file is a jar file or not. If the file is a jar file, an object of class AuditableJarFile, which extends java.util.jar.JarFile, is instantiated. An instance of the AuditMaster class then adds the file to the audit:
JarFile jar = new AuditableJarFile(root);auditor.addJarFileToAudit(jar);
The constructor of the AuditableJarFile class invokes the setExplodedContents()
method to build an instance variable ArrayList of the exploded contents of the jar (see Listing 3).
The AuditMaster uses a variety of TreeMap data structures to track the jar file names, file path locations of the jars, and the class names encountered in the exploded contents. The addJarFileToAudit method shown in Listing 4 administers all of these details.
After the traversal of the specified directory tree is complete and the audit data structures are populated, the code returns to the JarGui or XMLBuilder to finish preparing the results for display in the appropriate format. The JarGui class iterates over the maps and builds the appropriate JTree nodes, and the XMLBuilder similarly creates an identical nested structure of tags for writing out to the specified file location.End Class Name Conflicts
Getting a handle on the resources in your Java development platform or server is critical for avoiding insidious class conflicts in your applications. Understanding classloading will save you hours of debug time, attempting to discern why an application that previously worked is suddenly failing. The auditor, available for download here, can provide you with valuable insights into your classpath environment and help you ferret out the cause of a class name conflict. If used proactively, it can also help you anticipate class conflicts before they become a problem.
Enhancements on the Way
If you’ve followed the discussion of classloading and how the auditor works closely, you might have noticed a hole in the algorithm. The thorniest, most difficult-to-find classloading conflicts occur between classes that are in differently named jar files and/or share the same class name but are packaged differently. Currently, the auditor identifies duplicates and conflicts only when the jar files have the same name. It won’t catch identically named classes in different jar files. I’m working on an enhanced version of the auditor that patches this hole. Keep an eye out for a follow-up announcement on DevX.