Utility Class Fills in Java’s Missing Functionality

Java programmer seems to live in two worlds. One is the object-oriented world of classes, interfaces, methods, accessibility, and more advanced notions such as patterns, refactorings, aspects, and concerns. The other world?the “real world”?is absolutely independent from the first. It consists of files, folders, copy, compare, and chmod operations. Anybody who regularly switches between Perl and Java can find Java, an otherwise versatile and beautiful language, strangely awkward by comparison. It can seem to lack Perl’s elementary operations?operations that have nothing to do with an object model but are essential for down-to-earth programs that deal with the contents of our hard drives.

The Files utility in the MyJavaTools project, a site where I publish freely available, general-purpose Java tools, bundles all the methods and helper classes that Java is missing into one class. This article explores the methods this utility class offers and describes their functionality with regards to J2SE 5.0.

Under the Hood
The methods that the Files utility class provides are very simple:

  • For a filename string, two methods, dirname() and filename(), return the directory name and the file name?like in Perl.
  • getFullPath() returns the full path for a file or a string that contains a file path.
  • path(String dirname, String filepath) calculates the absolute path for a directory and a filepath. If filepath is absolute, it returns it. Otherwise, it is remains relative in the directory specified by dirname.
  • getcwd() returns the current directory name (the contents of the user.dir system property).
  • deleteFile(File file) and deleteFile(String filename) remove a file or the whole tree if it is actually a directory.
  • find(File directory, Pattern pattern), find(String dirname, Pattern pattern), and find(Strind dirname, String pattern) look within the tree for files whose names satisfy the regexp pattern specified.
  • findLatest(String dirname, String pattern), findLatestDirectory(String directory, String pattern), and findLatestFile(String dirname, String pattern) are similar to find() above, but they return at most one file or directory, the one with the latest timestamp. I have found them unusually useful in discovering changes in real time.

I’d love to have more methods of this kind, but some principal problems in Java prevent any platform-independent implementation.

Missing or Broken Functionality
Since 1997, users have been trying to convince Sun to do something about chdir(); many believe that global change of a current directory in an application is a dangerous operation. For example, beans in an app server may inadvertently change global settings and destroy the functionality that other beans use. Other languages and other types of applications allow the behavior, and operating systems readily provide this functionality.

A naïve solution would be to set the system property, user.dir, which is used occasionally to resolve relative path problems. Unfortunately, public opinion about whether this solution is effective or not is split. As a result, Java still has a weird behavior. For instance, consider the following piece of code:

  // create a temp directory  new File("C:\tmp\tmpdir").mkdirs();  // try to chdir to c:	mp  System.setProperty("user.dir", "C:\tmp");  // subdirectory, relative path  File subdir = new File("tmpdir");  // check that its absolute path is what we want  assertEquals("C:\tmp\tmpdir",subdir.getAbsolutePath());  // check that it is our new directory    assertTrue(subdir.getAbsoluteFile().isDirectory());  // see that we were grossly misled  assertFalse(subdir.isDirectory());

Here, the code resolves the file’s relative path to absolute path in different ways in its java.io and native methods.

Another example, File.setLastModified(), a seemingly elementary operation, does not always work. I had to add a static setLastModified(File file) to the library to bypass this bug. It does the trick recommended in the aforementioned bug report (do System.gc() if the operation did not work during the first attempt).

Two other methods, chmod and chown, will probably never be implemented, since they are perceived as platform-dependent. I believe that the Java community could at least tolerate POSIX as a platform. The chmod method is represented in File.setReadOnly() (with no “Back” button); chown is just missing.

Convenient Read/Write
Often, all a Java developer needs to do with a file is read its entire contents as a byte array, a char array, or a string, or write an array or a string to a file (version: append to the end of the file). The following is a whole bunch of useful methods included in the Files utility class:

  • readBytesFromFile(String filename) (This one uses memory-mapped files introduced with java.nio.)
  • readBytesFromStream(InputStream is)
  • readStringFromFile(File file)
  • readStringFromFile(File file, String encoding)
  • readStringFromFile(String filename)
  • WriteBytesToFile, a.k.a. writeToFile(byte[]data, String fileTo)
  • writeToFile(char[]data, String fileTo)
  • writeToFile(CharSequence data, String fileTo)
  • writeToFile(InputStream is, String fileTo) (This one pipes bytes from an input stream to a file.)
  • appendBytesToFile(byte[]data, String fileTo) a.k.a. appendToFile, appendBytesToFile(char[]data, String fileTo) (This one takes lower bytes from char[] data.)
  • appendToFile(char[]data, String fileTo)
  • appendToFile(CharSequence data, String fileTo)

Memory-mapped File Copy
Java.nio offers an interesting method for copying files. Instead of reading each byte and writing it to another file, you can memory map the files. The whole copy operation may look like this:

  FileInputStream  is = new FileInputStream(from);  FileOutputStream os = makeFile(to);  os.getChannel().transferFrom(is.getChannel(), 0, from.length());  is.close();  os.close();

I tried to check whether one gains any advantage besides saving memory from using this, but benchmarking file copy is not easy. On a machine with 1GB of memory, I created eight 100MB files. I then copied them, one after another, to ensure that there was no trace of the first file in any cache. I timed only the first of these files and repeated the operation ten times, both with memory mapping and without. I spent two hours running it all and produced the following results:

  • It took 16.897558069 seconds to copy with NIO.
  • It took 17.638485729 seconds to copy without NIO.
  • NIO is 740.92766 milliseconds faster.

Directory Copy
Two methods, copy(File from, File to) and copy(String form, String to), copy either files or directories, depending on the type of File from. Two other methods, copy(File from, File to, String what) and copy(String from, String to, String what), help copy a file or a subdirectory (what) from a directory (from) to a directory (to). The Java.net forum posted a cool suggestion for this operation.

Several other methods help deal with the results of copying and/or help decide what to copy where. Boolean equal(File left, File right) returns true if the two files are equal (same contents, same timestamps) or, in case of directories, the entire directory trees are identical. Synchronize(File left, File right) and synchronize(File left, File right, String what) help eliminate the difference; they copy missing files over and replace older files with newer versions, in both directions.

Scanning Directory Trees: Inversion of Control
Scanning a directory tree and doing something with the contents of each node is a common task. Schools taught that recursion kicks in there. If you open a zip file and start browsing though its contents, you find no tree there, just files with compound names. Why not do the same with ordinary directories? Must you create separate recursive methods for every operation you do on directories? Of course not. Simply move all tree browsing into an iterable. If necessary, choose preorder or postorder traversal.

For width-first traversal, you need an additional queue?but who traverses a directory tree horizontally? It is totally unnatural. The natural way is depth first, which is why you need recursion. With preorder, the current node goes first and its subnodes follow. With postorder, the current node completes the recursion.

If you hide all the recursion and complexity in an iterator of iterators, that would be a typical Inversion of Control (IoC) paradigm (or is it a pattern already?). The client controls the move to the next node. Since J2SE 5.0 has a beautiful new feature, the foreach loop, the whole picture now looks like this:

    for (File dir : tree(new File("."))) {      System.out.println(dir);      for (File f : files(dir)) {        System.out.println("  " + f);      }    }

Note two methods, tree(File directory) and files(File directory). The former returns all the subdirectories, preorder; the latter returns only files in the specified directory. Why use two loops where one would probably suffice? My practice showed that for a tree operation you would need to do something on the directory itself, not only on its contents. For instance, you scan all the files and create a summary or stats file in the same directory.

The treePostOrder(File file) method serves as an alternative to preorder. Postorder is a relatively rare method, hence the longer name.

If you don’t need all files or all subdirectories, you can use two other forms: tree(File file, FileFilter filter) and files(File file, FileFilter filter), which return only “good” files or subdirectories.

Iterating Through Bytes, Characters, and Lines
Even though J2SE 5.0 has the Iterable interface, autoboxing, and the foreach loop, InputStream and Reader do not provide iterators that return bytes and chars accordingly. You can fill this gap with two byte-producing iterables: bytes(InputStream is) and bytes(File file). The following is a typical usage of this construction (known in XXth century as the Iterator pattern):

    for (byte b : bytes(new File("c:\windowszapotec.bmp"))) {      System.out.println(Integer.toHexString(b));    }

You can use similar methods, chars(File file) and chars(Reader reader), to scan through characters in a file:

    for (char c : chars(new File("c:\windowswin.ini"))) {      System.out.println("'"+c+"' (" + Integer.toHexString(c) +")");    }

The two remaining methods, lines(File file) and lines(Reader reader), scan through a text file’s lines:

    for (String line : lines(new File("c:\windowswin.ini"))) {      System.out.println(">" + line);    }

Note that these operations have a hidden problem. What happens to the input stream that is open behind the scenes? The private iterator that lists input data does its best to close the stream after it finds no more data there, but other unfortunate events can happen. For instance, an exception can be thrown and control then never returns to the loop. For this occurrence, all these internal iterators implement finalize() where the stream or reader is being closed, unless it was not previously closed. Since the iterables and iterators produced by bytes(), chars(), or lines() are not referenced outside the scope of the for loop, the first garbage collection closes the remaining streams. Note that the code above calls System.gc() after each failed attempt of setLastModified().

Library Sugar
Two methods, install(Class c, String resourceName, File directory) and install(Class c, String resourceName, String directoryName) may be all you need to implement a simple software installer. These methods extract a resource from the application jar (or wherever it is deployed). Expecting the resource to be a zip archive, they unzip the archive and store the content to the specified directory. Often, this is all that an installer needs, is it not?

The Tiger Takeover
Files is just one utility class in MyJavaTools.com project. The other classes in the library are currently being converted to J2SE 5.0. This latest version of Java is gradually gaining popularity, but the Java community may take years to really appreciating all its features.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin


Recent Articles: