
s more and more information fills our lives and clutters our inboxes, our ability to effectively read, filter, and process this information manually declines hand-in-hand. There is only so much time that we can spend at it. The trend shows no signs of abating, despite the best efforts of many individuals and companies in the industry. By all accounts, things are going to get worse.
Enter intelligent filters, ones that not only look for certain keywords that don't need to be reprinted here, but that also attempt to determine the sentiment of text. In other words, filters that can read an e-mail and statistically figure out what it is about and whether it interests you or not based on a set of parameters that you define. Many modern spam filters do this, training themselves on the mail that you specify is or isn't spam. These tools are getting better by the day but they aren't foolproof. For example, false positives are a frequent problem.
Classifier4J is an open source Java library designed just for this purpose, that is, for classifying text. (It is available from Sourceforge at
http://classifier4j.sourceforge.net.) It has an implementation of a Bayesian classifiera statistical methodology for calculating the probability of a given hypothesis being true (based on
Bayes theorem; see
http://www.paulgraham.com/better.html for a good implementation outline). A Bayesian classifier is typically used in evaluating the contents of text for a given subject matter. The classic example is in determining if an e-mail is a spam or not.
In this article I will build a simple POP3 client using the JavaMail API, which has lots of very cool features that allow you to build your own mail applications that use IMAP, POP3 and SMTP. Check the
Sun documentation for in-depth details. This client will pull e-mails from your POP3 box and pass them through the classifier4J libraries to classify their contents, determine their spam relevance, and even do an automatic summary of their contents!
To get started, you first need to get and use the JavaMail API. This is available from
Sun. (The source code in this article uses version 1.3.1). You will also need the
JavaBeans Activation Framework (JAF), which is a dependency of JavaMail.
Once you have downloaded and installed these packages, you are ready to build your first e-mail client. You will need to have a POP3 e-mail account, and the username, login, and server name details associated with that account.