dcsimg
Login | Register   
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX

By submitting your information, you agree that devx.com may send you DevX offers via email, phone and text message, as well as email offers about other products and services that DevX believes may be of interest to you. DevX will process your information in accordance with the Quinstreet Privacy Policy.


Tip of the Day
Language: Java
Expertise: Intermediate
Mar 29, 1999

WEBINAR:

On-Demand

Application Security Testing: An Integral Part of DevOps


Use BreakIterator to Parse Text

Parsing text is a common, complex operation. For example, your application might need to allow users to enter text and then break the text into separate words or sentences for processing. On the surface, this task seems easy. In the case of sentence parsing, for instance, it may appear that you can separate sentences simply by searching for the period (.) character. One problem with this approach is that characters other than the period can be used to end a sentence, such as question marks (?) or exclamation marks (!). In addition, periods have other uses, such as representing decimal points. To make matters worse, a different language may use an entirely different set of characters for sentence termination, or could use these same characters in a different way. Fortunately, the java.text.BreakIterator class provides some powerful parsing capabilities in a language-independent manner. This sample code illustrates how you can use the BreakIterator to parse a string on a per-sentence basis:
 
import java.text.*;

public class parseit {
	public static void main(String[] args) {
		String sentence;
		String text = "John Smith stopped by earlier " +
					"to say 'Happy birthday!' Aren't " +
					"you and he the same age? He and " +
					"his wife have 2.5 children.";
		BreakIterator bi = BreakIterator.getSentenceInstance();
		bi.setText(text);
		int index = 0;
		while (bi.next() != BreakIterator.DONE) {
			sentence = text.substring(index, bi.current());
			System.out.println("Sentence: " + sentence);
			index = bi.current();
		}  //  while (bi.next() != BreakIterator.DONE)
	}  //  public static void main()
}  //  public class parseit
Running this program produces this output:
 
Sentence: John Smith stopped by earlier to say 'Happy birthday!'
Sentence: Aren't you and he the same age?
Sentence: He and his wife have 2.5 children.
The BreakIterator class also provides static getCharacterInstance(), getWordInstance, and getLineInstance() methods. These methods return BreakIterator instances that allow you to parse at the character, word, and line level, respectively.
Brett Spell
 
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap
×
We have made updates to our Privacy Policy to reflect the implementation of the General Data Protection Regulation.
Thanks for your registration, follow us on our social networks to keep up-to-date