advertisement
Premier Club Log In/Registration
  Include Code  Search Tips
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   SKILLBUILDING  |   TIP BANK  |   SOURCEBANK  |   FORUMS  |   NEWSLETTERS
Browse DevX
Download the code for this article
Partners & Affiliates
advertisement
advertisement
advertisement
Average Rating: 4/5 | Rate this item | 3 users have rated this item.
Email this articleEmail this article
 
Create Intelligent E-mail Filters with JavaMail and Classifier4j
Tired of the limitations and annoying false positives with commercial spam filters? Classifier4J is an open source Java library that will let you build custom applications that read e-mails and other types of text documents, separating the wheat from the chaff exactly the way you intend.  

advertisement
s more and more information fills our lives and clutters our inboxes, our ability to effectively read, filter, and process this information manually declines hand-in-hand. There is only so much time that we can spend at it. The trend shows no signs of abating, despite the best efforts of many individuals and companies in the industry. By all accounts, things are going to get worse.


Enter intelligent filters, ones that not only look for certain keywords that don't need to be reprinted here, but that also attempt to determine the sentiment of text. In other words, filters that can read an e-mail and statistically figure out what it is about and whether it interests you or not based on a set of parameters that you define. Many modern spam filters do this, training themselves on the mail that you specify is or isn't spam. These tools are getting better by the day but they aren't foolproof. For example, false positives are a frequent problem.

Classifier4J is an open source Java library designed just for this purpose, that is, for classifying text. (It is available from Sourceforge at http://classifier4j.sourceforge.net.) It has an implementation of a Bayesian classifier—a statistical methodology for calculating the probability of a given hypothesis being true (based on Bayes theorem; see http://www.paulgraham.com/better.html for a good implementation outline). A Bayesian classifier is typically used in evaluating the contents of text for a given subject matter. The classic example is in determining if an e-mail is a spam or not.

In this article I will build a simple POP3 client using the JavaMail API, which has lots of very cool features that allow you to build your own mail applications that use IMAP, POP3 and SMTP. Check the Sun documentation for in-depth details. This client will pull e-mails from your POP3 box and pass them through the classifier4J libraries to classify their contents, determine their spam relevance, and even do an automatic summary of their contents!

To get started, you first need to get and use the JavaMail API. This is available from Sun. (The source code in this article uses version 1.3.1). You will also need the JavaBeans Activation Framework (JAF), which is a dependency of JavaMail.

Once you have downloaded and installed these packages, you are ready to build your first e-mail client. You will need to have a POP3 e-mail account, and the username, login, and server name details associated with that account.

  Next Page: Building Your First E-mail Client


Page 1: IntroductionPage 3: Simple Text Classification
Page 2: Building Your First E-mail Client 
advertisement
Advertising Info  |   Member Services  |   Permissions  |   Contact Us  |   Help  |   Feedback  |   Site Map  |   Network Map  |   About


JupiterOnlineMedia

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers

Solutions
Whitepapers and eBooks
IBM Whitepaper: Innovative Collaboration to Advance Your Business
Internet.com eBook: Real Life Rails
Avaya Article: Call Control XML - Powerful, Standards-Based Call Control
Tripwire Whitepaper: Seven Practical Steps to Mitigate Virtualization Security Risks
Internet.com eBook: The Pros and Cons of Outsourcing
Go Parallel Article: Scalable Parallelism with Intel(R) Threading Building Blocks
Internet.com eBook: Best Practices for Developing a Web Site
IBM CXO Whitepaper: The 2008 Global CEO Study "The Enterprise of the Future"
Avaya Article: Call Control XML in Action - A CCXML Auto Attendant
Go Parallel Article: James Reinders on the Intel Parallel Studio Beta Program
IBM CXO Whitepaper: Unlocking the DNA of the Adaptable Workforce--The Global Human Capital Study 2008
Adobe Acrobat Connect Pro: Web Conferencing and eLearning Whitepapers
Go Parallel Article: Getting Started with TBB on Windows
HP eBook: Storage Networking , Part 1
MORE WHITEPAPERS, EBOOKS, AND ARTICLES
Webcasts
Go Parallel Video: Intel(R) Threading Building Blocks: A New Method for Threading in C++
HP Video: Is Your Data Center Ready for a Real World Disaster?
Microsoft Partner Portal Video: Microsoft Gold Certified Partners Build Successful Practices
HP On Demand Webcast: Virtualization in Action
Go Parallel Video: Performance and Threading Tools for Game Developers
Rackspace Hosting Center: Customer Videos
Intel vPro Developer Virtual Bootcamp
HP Disaster-Proof Solutions eSeminar
HP On Demand Webcast: Discover the Benefits of Virtualization
MORE WEBCASTS, PODCASTS, AND VIDEOS
Downloads and eKits
Microsoft Download: Silverlight 2 Software Development Kit Beta 2
30-Day Trial: SPAMfighter Exchange Module
Red Gate Download: SQL Toolbelt
Iron Speed Designer Application Generator
Microsoft Download: Silverlight 2 Beta 2 Runtime
MORE DOWNLOADS, EKITS, AND FREE TRIALS
Tutorials and Demos
IBM IT Innovation Article: Green Servers Provide a Competitive Advantage
Microsoft Article: Expression Web 2 for PHP Developers--Simplify Your PHP Applications
Featured Algorithm: Intel Threading Building Blocks - parallel_reduce
MORE TUTORIALS, DEMOS AND STEP-BY-STEP GUIDES