any of us have found ways to forget about grammar since finishing school and entering the "real world" of business and commerce. Unfortunately the division between academia and the business world isn't quite so cut and dry. The way we use language and words in day-to-day life greatly influences how people perceive and treat us. The same is true of software applications; people have increasingly higher expectations for computer software. More and more, they expect human-computer interactions to be steeped in the fluid language and thought processes of humans rather than the linear, rigid, procedures of computers. For example, people now expect search engines to understand English words, resolve spelling errors, and deal with plurality. They expect word processors to catch and even correct their grammar errors, and call-center systems to understand their speech.
This article introduces several types of linguistic processing techniques and tools that you can use in your applications to help bridge the gap between the literal world of computers and the fluid logic of humans. The landscape of linguistic processing is so broad that one article can't possibly cover everything, but this article may prompt you to notice areas in your own applications where applying a little linguistic expertise can have a big impact on users' experience.
These techniques stem from an area of research called computational linguistics, which seeks to apply computational (primarily statistically-based) processing techniques to natural language. Computational linguistics is a broad field, with many subfields that can benefit business-oriented applications today, such as tagging parts of speech, parsing, sentence detection, phrase chunking, pluralization, and many more. The remainder of this article shows you how to build three examples that illustrate interesting things you can do by using three Java language technologies:
- Text classification (using LingPipe)
- Sentence identification (using OpenNLP)
- Pluralization (using Inflector)