Login | Register   
Twitter
RSS Feed
Download our iPhone app
TODAY'S HEADLINES  |   ARTICLE ARCHIVE  |   FORUMS  |   TIP BANK
Browse DevX
Sign up for e-mail newsletters from DevX


Tip of the Day
Language: Java Language
Expertise: Beginner
Apr 5, 1999

StringTokenizer: Multiple Delimiter Characters

Question:
I am writing a text search applet that searches through a text file loaded from the server. I am using StringTokenizer to isolate the individual words. The problem is that the words in the file are not necessarily separated by spaces or just one specific delimiter. How do I make StringTokenizer ignore extra characters such as quotes and just tokenize the single word?

Answer:
The documentation for StringTokenizer can lead you to believe that it is capable of recognizing only a single delimiter at a time. But if read it carefully, you will find that StringTokenizer can recognize any number of delimiters. The delimiter argument of the StringTokenizer constructor is a string whose every character is interpreted as a delimiter. The string as a whole is not the delimiter, but rather, its constituent characters are each a delimiter. For example, to use spaces, commas, and colons as delimiters, you would create a StringTokenizer with:

StringTokenizer tokenizer = new StringTokenizer(input, " ,:");

Using StringTokenizer to parse a file is generally not efficient if you read the file a line at a time because you have to create a new tokenizer for each line. In addition, the parsing ability of StringTokenizer is minimal. Imagine that you wanted to use a multicharacter delimiter; you can't do this with StringTokenizer. For more complicated tokenization, you may want to look into a regular expression library or a lexer generator.

DevX Pro
 
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Sitemap