Python Boosts JAR File Auditor Functionality

Python Boosts JAR File Auditor Functionality

nderstanding the Java classpath and Java’s classloading mechanism are essential for any proficient Java developer. In a previous DevX article (Put an End to Jar File and Class Name Conflicts), I discussed how duplicate jar files and classes can cause hard-to-detect naming conflicts, which produce errors that are difficult to debug. A simple jar file audit utility I wrote in Java can make identifying these problematic duplications much easier.

However, this Java auditor was my first pass at the tool. I’ve since discovered that Python enables a port of the utility that is cleaner in its implementation and more complete in its auditing capabilities than my original. In this article, I present the Python port (which I hereafter refer to as the auditor), discuss its advantages, and highlight some of the great Python features that allow you to produce robust functionality with minimal code.

Author’s Note: To follow this article, you need a basic understanding of Python syntax and data structures. A basic knowledge of object-oriented design also is a prerequisite for understanding the code, particularly the use of Python XML binding. The audit returns results as an XML document and uses an XSL stylesheet for formatting. So some knowledge of XML and XSLT technology also would be advantageous if you wish to modify or enhance the stylesheet that’s included with the source code for the utility.

Python’s Advantages Over Java
The original Java version of the auditor walks a specified directory tree and audits it for duplicate jar files and duplicate class files within them. The problem with its algorithm is that the design relies on the existence of duplicate jar files in order to identify duplicate class files. For example, if class A were in JarFile1 and another instance of class A was in JarFile2, the Java auditor wouldn’t catch the duplication because the jar file names are different.

This gap in the program logic was more a sin of commission than omission. I wanted the auditor to be a simple, lightweight utility that didn’t require an embedded object data store or hooks to an RDBMS. So I decided to consciously scale down the utility, but I struggled to find an intuitive method to track the progress of the audit. I wanted to identify jar-file and class duplications and provide the necessary location data in the audit report by means of a multi-keyed data structure.

Unfortunately, no such structure exists in Java. I could’ve used some combination of objects, hash tables, and linked lists, but this seemed to be overkill for such a simple concept. So I opted for a less-than-optimal first attempt and hoped that the insight I gained from coding it would lead me to the kind of solution I desired.

While I was struggling with the audit-tracking problem, I started learning Python and discovered Python’s dictionary data type. A dictionary is a key-value pair somewhat akin to a hashtable. The crucial difference between the two structures is that a dictionary supports the notion of multi-keyed data whereas a hashtable uses a single object as a key. Some workarounds could create a unique object for the needed keys in Java, but Python’s dictionary makes the storage and access of the audit data far simpler and more intuitive.

The Python Auditor
The auditor’s base programmatic operation is to walk a directory tree and scan for jar files. I didn’t have it look for plain old class files because they’re easy to find with a brute-force search on Windows or Unix. The command line arguments specify the directory tree and the file type that is being audited. I typically look only for jar files because including zip files casts too big a net on a Windows file system. So the command line would look like this so far:

python C:Auditorauditor.py c:AuditTestDir *.jar

The last piece of the command line is a redirection symbol to tell the auditor where it should print the XML results:

	python C:Auditorauditor.py c:AuditTestDir *.jar  >
C:AuditorResultsmyauditresults.txt

The first thing the main( ) method of the script does is invoke the listFiles( ) method passing the root directory and the file extension pattern. The listFiles( ) method (much of which is directly borrowed from page 144 of O’Reilly’s Python Cookbook) creates an instance of the Bunch class called args that collects all of the command line arguments into a list of name value pairs. The Bunch class’s constructor does this using a slick Pythonic construct shown in the following snippet:

class Bunch:	def __init__(self, **kwds): self.__dict__.update(kwds)	arg = Bunch(recurse=recurse, pattern_list=pattern_list,
return_folders=return_folders, results=[])

The **kwds argument collects all of the arguments into name/value pair maps. So you’ll see, for example, arg.pattern_list in the code as an attribute of the Bunch class object referenced by the variable arg.

To walk the directory, I invoked the command os.path.walk and used the Python standard os library. This is an amazing piece of built-in functionality, and I’m somewhat surprised that no comparable method exists in the vast collection of standard Java libraries. The Pythonic walk( ) does much of the heavy lifting for me. The Python documentation succinctly describes the operation of this method:

[os.path.walk() ] Calls the function visit with arguments (arg, dirname, names) for each directory in the directory tree rooted at path (including path itself, if it is a directory). The argument dirname specifies the visited directory, the argument names lists the files in the directory (gotten from os.listdir (dirname)).

The following is the auditor implementation of the visit( ) method:

def visit(arg, dirname, files):	"""Called by walk to visit each file and assess its suitability
for inclusion in the audit""" #Append to arg.results all relevant files (and perhaps folders) for name in files: fullname = os.path.normpath(os.path.join(dirname, name)) if arg.return_folders or os.path.isfile(fullname): for pattern in arg.pattern_list:
#*.jar command line argument if fnmatch.fnmatch(name, pattern): arg.results.append(fullname) addToAudit(fullname)
#added for audit break # if recursion is blocked then set result list if not arg.recurse: files[:]=[]

Methods are first-class objects in Python, so they can be passed as arguments just like any other object or primitive value in other languages.

The critical auditing functionality I added was the invocation of the addToAudit() method. This method accepts the full name of the jar file, which it uses to perform auditing of that jar file based on its name. This is where the flexibility of Python’s dictionary data type comes into play.

Python’s Dictionary Data Type
To capture and interrelate all of the data elements that I determined, the four dictionaries shown in Table 1 were required.

Table 1. Auditor Dictionaries Used to Track Audit Progress and Report Results
VariableName: Key: Values: Purpose:
master_jar_list Jar file name Tuple(Full jar file name, time modified) Master inventory of all jar file names audited
dup_jar_list Tuple(Full jar file path name, Jar file name) master_jar_list[jarfile] – lookup to the master_jar_list to retrieve this value. Listing of all duplicated jar file names
master_class_name_list Class name Tuple(Full filename, package_name, date modified) Master inventory of all class names audited
dup_class_name_list Tuple(Class name, package name) Tuple(full jar name, date class modified, [List of all jar file locations that contain this class] Listing of all duplicate class names, whether or not the packaging matches

These four data structures suffice to keep a running inventory of the audit process. In the addToAudit( ) method, the following code performs the necessary checks to add the jar file to either the master jar file inventory or the list of duplicate jars:

if(master_jar_list.has_key(jarfile)):#duplicate jar dictionary uses the full file name as the key#the value is a tuple of the   *****ADD time modified  *********dup_jar_list[(full_jarname, jarfile)] = master_jar_list[jarfile]else:#master list can use just the X.jar name as the key whereas 
the duplicate list must distinguish between#potentially multiple copies of the same file. master_jar_list[jarfile] = (full_jarname, time_modified)#check class files archived in the .jar filereadZip(full_jarname, jarfile)

Once the jar file name is evaluated, the last line passes the jar file along to the readZip( ) method, which interrogates the contents of the jar and uses the class-name-specific dictionaries for appropriate auditing:

for aFile in z.filelist:	            #Master class name list is keyed by class name only and Dup's show 
#packaging#this way classes of the same name but with different packaging will be identified
#as duplicates. The file_locations list is an aggregate listing of all the
#places that a class of this exact name and packaging occur. If just the class
#name is the same but the packaging is different then the class will be listed as
#a separate duplicate entry.file_locations = []if(master_class_list.has_key(class_name)): full_jarname, mod_date = master_jar_list[jar_name_only] if(dup_class_list.has_key(lookup_key)): file_locations = dup_class_list[lookup_key][2] file_locations.append(full_filename)dup_class_list[lookup_key] = (full_jarname, formatted_modified_date,
file_locations) else: #the master_class_list call below returns a tuple of
which the first element is neededfile_locations.append(master_class_list[class_name][0])dup_class_list[lookup_key] = (full_jarname, formatted_modified_date,
file_locations) else: #add to the master class list for the first occurrence master_class_list[class_name]=(full_filename, package_name,
formatted_modified_date)

After assembling all the necessary audit results into four dictionaries, the buildXML( ) method loads the dictionaries into an instance of the DictionaryHolder class. This simple manipulation allows an XML document to be generated from the DictionaryHolder object by means of the Gnosis Python XML binding libraries. The following line of code is all that is required to create a valid XML document from the contents of the dictionaries:

#variable o is the reference to the DictionaryHolder objectxml_string = xml_pickle.XML_Pickler(o).dumps(deepcopy=1)

XSL Stylesheets Dress Up the Output
As I mentioned previously, using Python’s print statement in conjunction with simple command-line redirection makes it easy to send the output XML document to wherever you’d like. Unfortunately, the following generic XML output is rather tough to read because the tags that the XML binding library implemented aren’t specifically named to correspond to the individual data elements in the dictionaries:

                         

Fear not, this can be overcome. An XSL stylesheet can format the raw output into tables. The tables (see Figure 1) may not be works of art, but they are easier to read than the raw XML. I’ll leave it to you to enhance the look and feel as you see fit.

Figure 1: XSL Stylesheet Formats Raw Output into Tables

Powerful Apps, Minimal Code
Python’s powerful libraries and minimalist syntax allow you to write readable, high- octane applications in fewer lines of code than with Java. In this case, the Python port of the auditor takes full advantage of the following features to deliver more robust functionality in far fewer lines of code than the Java auditor did:

  • Python’s flexible data structures make it easy to create complex data relationships.
  • Python is an object-oriented language, so you can make use of cutting edge XML binding technology to reduce the coding time and effort it takes to produce XML data tailored to meet your needs.
  • A wealth of open-source code libraries are available to help bolster Python’s out-of-the-box offerings.
  • devx-admin

    devx-admin

    Share the Post:
    Clean Energy Adoption

    Inside Michigan’s Clean Energy Revolution

    Democratic state legislators in Michigan continue to discuss and debate clean energy legislation in the hopes of establishing a comprehensive clean energy strategy for the

    Chips Act Revolution

    European Chips Act: What is it?

    In response to the intensifying worldwide technology competition, Europe has unveiled the long-awaited European Chips Act. This daring legislative proposal aims to fortify Europe’s semiconductor

    Revolutionized Low-Code

    You Should Use Low-Code Platforms for Apps

    As the demand for rapid software development increases, low-code platforms have emerged as a popular choice among developers for their ability to build applications with

    Global Layoffs

    Tech Layoffs Are Getting Worse Globally

    Since the start of 2023, the global technology sector has experienced a significant rise in layoffs, with over 236,000 workers being let go by 1,019

    Clean Energy Adoption

    Inside Michigan’s Clean Energy Revolution

    Democratic state legislators in Michigan continue to discuss and debate clean energy legislation in the hopes of establishing a comprehensive clean energy strategy for the state. A Senate committee meeting

    Chips Act Revolution

    European Chips Act: What is it?

    In response to the intensifying worldwide technology competition, Europe has unveiled the long-awaited European Chips Act. This daring legislative proposal aims to fortify Europe’s semiconductor supply chain and enhance its

    Revolutionized Low-Code

    You Should Use Low-Code Platforms for Apps

    As the demand for rapid software development increases, low-code platforms have emerged as a popular choice among developers for their ability to build applications with minimal coding. These platforms not

    Cybersecurity Strategy

    Five Powerful Strategies to Bolster Your Cybersecurity

    In today’s increasingly digital landscape, businesses of all sizes must prioritize cyber security measures to defend against potential dangers. Cyber security professionals suggest five simple technological strategies to help companies

    Global Layoffs

    Tech Layoffs Are Getting Worse Globally

    Since the start of 2023, the global technology sector has experienced a significant rise in layoffs, with over 236,000 workers being let go by 1,019 tech firms, as per data

    Huawei Electric Dazzle

    Huawei Dazzles with Electric Vehicles and Wireless Earbuds

    During a prominent unveiling event, Huawei, the Chinese telecommunications powerhouse, kept quiet about its enigmatic new 5G phone and alleged cutting-edge chip development. Instead, Huawei astounded the audience by presenting

    Cybersecurity Banking Revolution

    Digital Banking Needs Cybersecurity

    The banking, financial, and insurance (BFSI) sectors are pioneers in digital transformation, using web applications and application programming interfaces (APIs) to provide seamless services to customers around the world. Rising

    FinTech Leadership

    Terry Clune’s Fintech Empire

    Over the past 30 years, Terry Clune has built a remarkable business empire, with CluneTech at the helm. The CEO and Founder has successfully created eight fintech firms, attracting renowned

    The Role Of AI Within A Web Design Agency?

    In the digital age, the role of Artificial Intelligence (AI) in web design is rapidly evolving, transitioning from a futuristic concept to practical tools used in design, coding, content writing

    Generative AI Revolution

    Is Generative AI the Next Internet?

    The increasing demand for Generative AI models has led to a surge in its adoption across diverse sectors, with healthcare, automotive, and financial services being among the top beneficiaries. These

    Microsoft Laptop

    The New Surface Laptop Studio 2 Is Nuts

    The Surface Laptop Studio 2 is a dynamic and robust all-in-one laptop designed for creators and professionals alike. It features a 14.4″ touchscreen and a cutting-edge design that is over

    5G Innovations

    GPU-Accelerated 5G in Japan

    NTT DOCOMO, a global telecommunications giant, is set to break new ground in the industry as it prepares to launch a GPU-accelerated 5G network in Japan. This innovative approach will

    AI Ethics

    AI Journalism: Balancing Integrity and Innovation

    An op-ed, produced using Microsoft’s Bing Chat AI software, recently appeared in the St. Louis Post-Dispatch, discussing the potential concerns surrounding the employment of artificial intelligence (AI) in journalism. These

    Savings Extravaganza

    Big Deal Days Extravaganza

    The highly awaited Big Deal Days event for October 2023 is nearly here, scheduled for the 10th and 11th. Similar to the previous year, this autumn sale has already created

    Cisco Splunk Deal

    Cisco Splunk Deal Sparks Tech Acquisition Frenzy

    Cisco’s recent massive purchase of Splunk, an AI-powered cybersecurity firm, for $28 billion signals a potential boost in tech deals after a year of subdued mergers and acquisitions in the

    Iran Drone Expansion

    Iran’s Jet-Propelled Drone Reshapes Power Balance

    Iran has recently unveiled a jet-propelled variant of its Shahed series drone, marking a significant advancement in the nation’s drone technology. The new drone is poised to reshape the regional

    Solar Geoengineering

    Did the Overshoot Commission Shoot Down Geoengineering?

    The Overshoot Commission has recently released a comprehensive report that discusses the controversial topic of Solar Geoengineering, also known as Solar Radiation Modification (SRM). The Commission’s primary objective is to

    Remote Learning

    Revolutionizing Remote Learning for Success

    School districts are preparing to reveal a substantial technological upgrade designed to significantly improve remote learning experiences for both educators and students amid the ongoing pandemic. This major investment, which

    Revolutionary SABERS Transforming

    SABERS Batteries Transforming Industries

    Scientists John Connell and Yi Lin from NASA’s Solid-state Architecture Batteries for Enhanced Rechargeability and Safety (SABERS) project are working on experimental solid-state battery packs that could dramatically change the

    Build a Website

    How Much Does It Cost to Build a Website?

    Are you wondering how much it costs to build a website? The approximated cost is based on several factors, including which add-ons and platforms you choose. For example, a self-hosted

    Battery Investments

    Battery Startups Attract Billion-Dollar Investments

    In recent times, battery startups have experienced a significant boost in investments, with three businesses obtaining over $1 billion in funding within the last month. French company Verkor amassed $2.1

    Copilot Revolution

    Microsoft Copilot: A Suit of AI Features

    Microsoft’s latest offering, Microsoft Copilot, aims to revolutionize the way we interact with technology. By integrating various AI capabilities, this all-in-one tool provides users with an improved experience that not

    AI Girlfriend Craze

    AI Girlfriend Craze Threatens Relationships

    The surge in virtual AI girlfriends’ popularity is playing a role in the escalating issue of loneliness among young males, and this could have serious repercussions for America’s future. A

    AIOps Innovations

    Senser is Changing AIOps

    Senser, an AIOps platform based in Tel Aviv, has introduced its groundbreaking AI-powered observability solution to support developers and operations teams in promptly pinpointing the root causes of service disruptions