Login | Register   
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


advertisement
 

Extend the JDK Classes with Jakarta Commons, Part II

This second installment of a three-part series further explores components in Jakarta Commons and presents real world examples to demonstrate how you can use them in your projects.


advertisement
akarta Commons, the set of reusable classes that various Jakarta projects use, are available as separate components, which you can use in your Java projects. This article is the second in a three-part series exploring various Jakarta Commons components and demonstrating them in real-world sample applications. (Click here to read Part I.) The examples don't only illustrate the Commons components; they are complete, modular applications that highlight the useful features you can reuse in your Java projects.

In particular, this installment explores the following components:

  • Codec
  • DBCP
  • DBUtils
  • Email
  • i18n

The article also includes complete source code. Extract this zip file to a local drive and run it by launching the test cases with JUnit for each of the examples.



Author Note: A basic knowledge of object-oriented programming (OOP), the Gang of Four design patterns (Strategy and Decorator), and a few J2EE patterns (DAO) will be very helpful for understanding the Commons components architecture and the examples presented here.

Codec

Commons Codec contains some general encoding/decoding algorithms, including phonetic encoders, Hex and Base64 encoders, and a URL encoder. The phonetic encoders are language encoders, which are useful in applications such as search engines, spell-check functions, and digital dictionaries. Hex and Base64 encoders are useful in applications that use characters to represent binary data. The URL encoder comes with more features and is considered a replacement for the JDK classes URLEncoder and URLDecoder.

This component also contains the DigestUtils class, which is useful for creating SHA and MD5 digest. The next section shows how to use these classes in real world examples.

Language Encoders
Phonetic algorithms are used to determine words that sound similar. A very good example is a word processing application that suggests alternatives for a typed word. The Commons Codec contains four classes: Soundex, Metaphone, RefinedSoundex, and DoubleMetaphone. Each class uses a separate algorithm to determine whether a word sounds similar to another. Their algorithm descriptions indicate that Metaphone is more accurate than Soundex.

The first example application uses the Soundex class to determine similar words for a misspelled word. It uses the Strategy design pattern to choose among the algorithms, which also enables you to modify the application to support algorithms in the other three classes. (The application classes can be found in the package in.co.narayanan.commons.codec in the src folder of the source code.)

Thewords.txt file contains a small list of words. The Words class abstracts the loading of the word list from the file and adheres to the IWords interface. WordsAssistant is the entry point class for the application. It determines similar words using one of the Soundex algorithms and depends on the IWords interface to access the words. Listing 1 is the implementation for the getSimilarWords method in the WordsAssistant class. It picks a strategy from the SoundexStrategy class and iterates the words to determine a match. It determines the match by calling the isSimilar method in the ISimilarWordStrategy interface. Then it adds the matching words to a list that it returns to the caller.

Listing 1. Iterating Words Database to Determine Similar Words
ISimilarWordStrategy strategy = SoundexStrategy.getStrategy(type); List<String> similarWords = new ArrayList<String>(); // Iterate the words and append similar words to the list // and return Iterator<String> wordsList = words.getWords().iterator(); String fileWord; while(wordsList.hasNext()) { fileWord = wordsList.next(); try { if(strategy.isSimilar(searchWord, fileWord)) { similarWords.add(fileWord); } } catch (WordsAssistantException e) { throw new WordsAssistantException("Unable to determine similar words", e); } } return similarWords;

The classes SoundexStrategy and CharDiffStrategy provide the implementation to the ISimilarWordStrategy interface and use the Commons Soundex class. Listing 2 is the definition of the ISimilarWordStrategy interface. You can plug new strategies into the sample application by implementing the isSimilar method in this interface.

Listing 2. Interface to Enable Deciding Between the Algorithms
public interface ISimilarWordStrategy { boolean isSimilar(String word1, String word2) throws WordsAssistantException; }

In Listing 3, the method soundex in class org.apache.commons.codec.language.Soundex determines the sound similarity between words. This method returns a code that will be the same for similar words and then compares them to decide whether the words are similar. For instance, the code is A515 for the words 'compont', 'component', and 'compenent'.

Listing 3. Soundex Algorithm
private static class SoundexStrategy extends SimilarWordStrategy { public boolean isSimilar(String word1, String word2) throws WordsAssistantException { return soundex.soundex(word1).equals(soundex.soundex(word2)); } }

In Listing 4, the method difference in class org.apache.commons.codec.language.Soundex returns a number between 0 and 4, where 4 is the best match and 0 the worst. This example sets the pivot to 2. The JUnit test case class TestLanguageEncoders invokes the main class method getSimilarWords to demonstrate the application.

Listing 4. Alternate Way to Determine the Similarity
private static class CharDiffStrategy extends SimilarWordStrategy { private static final int DIFF_RANGE = 2; public boolean isSimilar(String word1, String word2) throws WordsAssistantException { try { return (soundex.difference(word1, word2) > DIFF_RANGE) ? true : false; } catch (EncoderException e) { throw new WordsAssistantException("Unable to determine the similarity", e); } } }

Binary Encoders
Binary encoders are useful for transmitting binary data in ASCII form. For instance, if an image needs to be attached to a digital business card stored in XML, a binary encoder can encode the image binary data using one of the algorithms and add it to the XML file in a separate tag.

The package org.apache.commons.codec.binary contains classes Base64, BinaryCodec, and Hex, each representing a unique way of encoding binary data. The sample application demonstrates using the Base64 algorithm to encode a binary file and store it in XML. The XML file contains metadata that describes the data in name/value pair form, which makes it searchable.

The class in.co.narayanan.commons.codec.WrapIt is the only class this example uses. It encodes the binary file and creates XML content along with the metadata details. Listing 5 shows a sample XML generated by this class.

Listing 5. Sample XML Content Generated by WrapIt Class
<data> <meta-data> <entry name='keywords' value='Image, Personal, Face, Profile'/> <entry name='filename' value='test.bmp'/> <entry name='author' value='Narayanan A R'/> </meta-data> <binary> Qk0+QwAAAAAAADYAAAAoAAAASQAAAE4AAAABABgAAAAAAAhDAAAAAAAAAAAAAADLMjC7MhCLEZB7AWBK8QAa0LAX0HAUUDla6Y7
vLw7vLw7vLw7vHx7vHx7vHx7vHx7vHx7vHx7fHw7fHw7fHw7fHx7fHx7fHx7vHw7vHw7vHw7fLw7fLw7fLw7fHx7fHx7fHxAO7y
8e7y8e7y8e/xPv+WPv+WPv+WPv+WPv+WPv+WPv+WPv+WPv+WPv+WPgA= </binary> </data>

The encoded binary data is enclosed in the <binary> tag. The metadata is represented as a series of <entry> tags. Encoding the binary content and storing it in an XML file simplifies its transmission through the Internet. For instance, a Visa application form in XML sent for processing to the Web service layer can carry the image and other binary contents such as résumés, scanned experience letters, and degree certificates.

Listing 6 is a code snippet from the WrapIt class, which does the actual encoding by reading 1,024 bytes at a time. The code calls the truncateBytes method only once for the last set of data read from the file. The encodeBase64Chunked static method breaks the encoded content into 76-character blocks to make it more human-readable. The metadata allows the XML-formatted data to be searchable.

Listing 6. Code Snippet Using Base64 for Encoding the Binary Content
while((bytesRead=inputStream.read(binaryData)) != -1) { if(bytesRead < 1024) { encodedBinaryData = Base64.encodeBase64Chunked(truncateBytes(binaryData, bytesRead)); } else { encodedBinaryData = Base64.encodeBase64Chunked(binaryData); } encodedData.write(encodedBinaryData); }

URL Encoder
The class org.apache.commons.codec.net.URLCodec implements the 'www-form-urlencoded' encoding scheme for a string, object, or array of bytes. This class is different from the JDK URLEncoder class for the following reasons:
  • It can perform encoding and decoding for a given character set.
  • In addition to strings, it works for objects and arrays of bytes as well.

No examples are included to illustrate this class usage because the URLCodec javadoc is self-explanatory and very straightforward.



Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap