Question:
I am writing a text search applet that searches through a text file loaded from the server. I am using StringTokenizer to isolate the individual words. The problem is that the words in the file are not necessarily separated by spaces or just one specific delimiter. How do I make StringTokenizer ignore extra characters such as quotes and just tokenize the single word?
Answer:
The documentation for StringTokenizer
can lead you tobelieve that it is capable of recognizing only a single delimiter at a time. But if read it carefully, you will find thatStringTokenizer
can recognize any number of delimiters. The delimiter argument of the StringTokenizer
constructoris a string whose every character is interpreted as a delimiter. The string as a whole is not the delimiter, but rather, its constituent characters are each a delimiter. For example, to use spaces, commas, and colons as delimiters, you would create a StringTokenizer
with:
StringTokenizer tokenizer = new StringTokenizer(input, " ,:");
Using StringTokenizer
to parse a file is generally not efficient if you read the file a line at a time because you have to create a new tokenizer for each line. In addition, the parsing ability of StringTokenizer
is minimal. Imagine that you wanted to use a multicharacter delimiter; you can’t do this with StringTokenizer
. For more complicated tokenization, you may want to look into a regular expression library or a lexergenerator.