devxlogo

StringTokenizer: Multiple Delimiter Characters

StringTokenizer: Multiple Delimiter Characters

Question:
I am writing a text search applet that searches through a text file loaded from the server. I am using StringTokenizer to isolate the individual words. The problem is that the words in the file are not necessarily separated by spaces or just one specific delimiter. How do I make StringTokenizer ignore extra characters such as quotes and just tokenize the single word?

Answer:
The documentation for StringTokenizer can lead you tobelieve that it is capable of recognizing only a single delimiter at a time. But if read it carefully, you will find thatStringTokenizer can recognize any number of delimiters. The delimiter argument of the StringTokenizer constructoris a string whose every character is interpreted as a delimiter. The string as a whole is not the delimiter, but rather, its constituent characters are each a delimiter. For example, to use spaces, commas, and colons as delimiters, you would create a StringTokenizer with:

StringTokenizer tokenizer = new StringTokenizer(input, " ,:");

Using StringTokenizer to parse a file is generally not efficient if you read the file a line at a time because you have to create a new tokenizer for each line. In addition, the parsing ability of StringTokenizer is minimal. Imagine that you wanted to use a multicharacter delimiter; you can’t do this with StringTokenizer. For more complicated tokenization, you may want to look into a regular expression library or a lexergenerator.

devx-admin

Share the Post: