Login | Register   
RSS Feed
Download our iPhone app
Browse DevX
Sign up for e-mail newsletters from DevX


Create Intelligent E-mail Filters with JavaMail and Classifier4j

Tired of the limitations and annoying false positives with commercial spam filters? Classifier4J is an open source Java library that will let you build custom applications that read e-mails and other types of text documents, separating the wheat from the chaff exactly the way you intend.




Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js

s more and more information fills our lives and clutters our inboxes, our ability to effectively read, filter, and process this information manually declines hand-in-hand. There is only so much time that we can spend at it. The trend shows no signs of abating, despite the best efforts of many individuals and companies in the industry. By all accounts, things are going to get worse. Enter intelligent filters, ones that not only look for certain keywords that don't need to be reprinted here, but that also attempt to determine the sentiment of text. In other words, filters that can read an e-mail and statistically figure out what it is about and whether it interests you or not based on a set of parameters that you define. Many modern spam filters do this, training themselves on the mail that you specify is or isn't spam. These tools are getting better by the day but they aren't foolproof. For example, false positives are a frequent problem.

Classifier4J is an open source Java library designed just for this purpose, that is, for classifying text. (It is available from Sourceforge at http://classifier4j.sourceforge.net.) It has an implementation of a Bayesian classifier—a statistical methodology for calculating the probability of a given hypothesis being true (based on Bayes theorem; see http://www.paulgraham.com/better.html for a good implementation outline). A Bayesian classifier is typically used in evaluating the contents of text for a given subject matter. The classic example is in determining if an e-mail is a spam or not. In this article I will build a simple POP3 client using the JavaMail API, which has lots of very cool features that allow you to build your own mail applications that use IMAP, POP3 and SMTP. Check the Sun documentation for in-depth details. This client will pull e-mails from your POP3 box and pass them through the classifier4J libraries to classify their contents, determine their spam relevance, and even do an automatic summary of their contents!

To get started, you first need to get and use the JavaMail API. This is available from Sun. (The source code in this article uses version 1.3.1). You will also need the JavaBeans Activation Framework (JAF), which is a dependency of JavaMail. Once you have downloaded and installed these packages, you are ready to build your first e-mail client. You will need to have a POP3 e-mail account, and the username, login, and server name details associated with that account.

Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.



Thanks for your registration, follow us on our social networks to keep up-to-date