Use BreakIterator to Parse Text

Parsing text is a common, complex operation. For example, your application might need to allow users to enter text and then break the text into separate words or sentences for processing. On the surface, this task seems easy. In the case of sentence parsing, for instance, it may appear that you can separate sentences simply by searching for the period (.) character. One problem with this approach is that characters other than the period can be used to end a sentence, such as question marks (?) or exclamation marks (!). In addition, periods have other uses, such as representing decimal points. To make matters worse, a different language may use an entirely different set of characters for sentence termination, or could use these same characters in a different way. Fortunately, the java.text.BreakIterator class provides some powerful parsing capabilities in a language-independent manner. This sample code illustrates how you can use the BreakIterator to parse a string on a per-sentence basis:

 import java.text.*;public class parseit {	public static void main(String[] args) {		String sentence;		String text = "John Smith stopped by earlier " +					"to say 'Happy birthday!' Aren't " +					"you and he the same age? He and " +					"his wife have 2.5 children.";		BreakIterator bi = BreakIterator.getSentenceInstance();		bi.setText(text);		int index = 0;		while (bi.next() != BreakIterator.DONE) {			sentence = text.substring(index, bi.current());			System.out.println("Sentence: " + sentence);			index = bi.current();		}  //  while (bi.next() != BreakIterator.DONE)	}  //  public static void main()}  //  public class parseit

Running this program produces this output:

 Sentence: John Smith stopped by earlier to say 'Happy birthday!'Sentence: Aren't you and he the same age?Sentence: He and his wife have 2.5 children.

The BreakIterator class also provides static getCharacterInstance(), getWordInstance, and getLineInstance() methods. These methods return BreakIterator instances that allow you to parse at the character, word, and line level, respectively.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

The Latest

microsoft careers

Top Careers at Microsoft

Microsoft has gained its position as one of the top companies in the world, and Microsoft careers are flourishing. This multinational company is efficiently developing popular software and computers with other consumer electronics. It is a dream come true for so many people to acquire a high paid, high-prestige job

your company's audio

4 Areas of Your Company Where Your Audio Really Matters

Your company probably relies on audio more than you realize. Whether you’re creating a spoken text message to a colleague or giving a speech, you want your audio to shine. Otherwise, you could cause avoidable friction points and potentially hurt your brand reputation. For example, let’s say you create a

chrome os developer mode

How to Turn on Chrome OS Developer Mode

Google’s Chrome OS is a popular operating system that is widely used on Chromebooks and other devices. While it is designed to be simple and user-friendly, there are times when users may want to access additional features and functionality. One way to do this is by turning on Chrome OS