Browse DevX
Sign up for e-mail newsletters from DevX


Python Boosts JAR File Auditor Functionality : Page 2

A Python port of a Java jar file auditor is cleaner in its implementation and more complete in its auditing capabilities. This article presents the Python port, discusses its advantages, and highlights some of the great Python features that allow you to produce robust functionality with minimal code.




Building the Right Environment to Support AI, Machine Learning and Deep Learning

The Python Auditor
The auditor's base programmatic operation is to walk a directory tree and scan for jar files. I didn't have it look for plain old class files because they're easy to find with a brute-force search on Windows or Unix. The command line arguments specify the directory tree and the file type that is being audited. I typically look only for jar files because including zip files casts too big a net on a Windows file system. So the command line would look like this so far:

python C:\Auditor\auditor.py c:\AuditTestDir *.jar

The last piece of the command line is a redirection symbol to tell the auditor where it should print the XML results:

python C:\Auditor\auditor.py c:\AuditTestDir *.jar >

The first thing the main( ) method of the script does is invoke the listFiles( ) method passing the root directory and the file extension pattern. The listFiles( ) method (much of which is directly borrowed from page 144 of O'Reilly's Python Cookbook) creates an instance of the Bunch class called args that collects all of the command line arguments into a list of name value pairs. The Bunch class's constructor does this using a slick Pythonic construct shown in the following snippet:

class Bunch: def __init__(self, **kwds): self.__dict__.update(kwds) arg = Bunch(recurse=recurse, pattern_list=pattern_list,
return_folders=return_folders, results=[])

The **kwds argument collects all of the arguments into name/value pair maps. So you'll see, for example, arg.pattern_list in the code as an attribute of the Bunch class object referenced by the variable arg.

To walk the directory, I invoked the command os.path.walk and used the Python standard os library. This is an amazing piece of built-in functionality, and I'm somewhat surprised that no comparable method exists in the vast collection of standard Java libraries. The Pythonic walk( ) does much of the heavy lifting for me. The Python documentation succinctly describes the operation of this method:

[os.path.walk() ] Calls the function visit with arguments (arg, dirname, names) for each directory in the directory tree rooted at path (including path itself, if it is a directory). The argument dirname specifies the visited directory, the argument names lists the files in the directory (gotten from os.listdir (dirname)).

The following is the auditor implementation of the visit( ) method:

def visit(arg, dirname, files): """Called by walk to visit each file and assess its suitability
for inclusion in the audit""" #Append to arg.results all relevant files (and perhaps folders) for name in files: fullname = os.path.normpath(os.path.join(dirname, name)) if arg.return_folders or os.path.isfile(fullname): for pattern in arg.pattern_list:
#*.jar command line argument if fnmatch.fnmatch(name, pattern): arg.results.append(fullname) addToAudit(fullname)
#added for audit break # if recursion is blocked then set result list if not arg.recurse: files[:]=[]

Methods are first-class objects in Python, so they can be passed as arguments just like any other object or primitive value in other languages.

The critical auditing functionality I added was the invocation of the addToAudit() method. This method accepts the full name of the jar file, which it uses to perform auditing of that jar file based on its name. This is where the flexibility of Python's dictionary data type comes into play.

Thanks for your registration, follow us on our social networks to keep up-to-date