Browse DevX
Sign up for e-mail newsletters from DevX


Python Boosts JAR File Auditor Functionality

A Python port of a Java jar file auditor is cleaner in its implementation and more complete in its auditing capabilities. This article presents the Python port, discusses its advantages, and highlights some of the great Python features that allow you to produce robust functionality with minimal code.

nderstanding the Java classpath and Java's classloading mechanism are essential for any proficient Java developer. In a previous DevX article (Put an End to Jar File and Class Name Conflicts), I discussed how duplicate jar files and classes can cause hard-to-detect naming conflicts, which produce errors that are difficult to debug. A simple jar file audit utility I wrote in Java can make identifying these problematic duplications much easier.

However, this Java auditor was my first pass at the tool. I've since discovered that Python enables a port of the utility that is cleaner in its implementation and more complete in its auditing capabilities than my original. In this article, I present the Python port (which I hereafter refer to as the auditor), discuss its advantages, and highlight some of the great Python features that allow you to produce robust functionality with minimal code.

Author's Note: To follow this article, you need a basic understanding of Python syntax and data structures. A basic knowledge of object-oriented design also is a prerequisite for understanding the code, particularly the use of Python XML binding. The audit returns results as an XML document and uses an XSL stylesheet for formatting. So some knowledge of XML and XSLT technology also would be advantageous if you wish to modify or enhance the stylesheet that's included with the source code for the utility.

Python's Advantages Over Java
The original Java version of the auditor walks a specified directory tree and audits it for duplicate jar files and duplicate class files within them. The problem with its algorithm is that the design relies on the existence of duplicate jar files in order to identify duplicate class files. For example, if class A were in JarFile1 and another instance of class A was in JarFile2, the Java auditor wouldn't catch the duplication because the jar file names are different.

This gap in the program logic was more a sin of commission than omission. I wanted the auditor to be a simple, lightweight utility that didn't require an embedded object data store or hooks to an RDBMS. So I decided to consciously scale down the utility, but I struggled to find an intuitive method to track the progress of the audit. I wanted to identify jar-file and class duplications and provide the necessary location data in the audit report by means of a multi-keyed data structure.

Unfortunately, no such structure exists in Java. I could've used some combination of objects, hash tables, and linked lists, but this seemed to be overkill for such a simple concept. So I opted for a less-than-optimal first attempt and hoped that the insight I gained from coding it would lead me to the kind of solution I desired.

While I was struggling with the audit-tracking problem, I started learning Python and discovered Python's dictionary data type. A dictionary is a key-value pair somewhat akin to a hashtable. The crucial difference between the two structures is that a dictionary supports the notion of multi-keyed data whereas a hashtable uses a single object as a key. Some workarounds could create a unique object for the needed keys in Java, but Python's dictionary makes the storage and access of the audit data far simpler and more intuitive.

Thanks for your registration, follow us on our social networks to keep up-to-date