I am writing a text search applet that searches through a text file loaded from the server. I am using StringTokenizer to isolate the individual words. The problem is that the words in the file are not necessarily separated by spaces or just one specific delimiter. How do I make StringTokenizer ignore extra characters such as quotes and just tokenize the single word?
The documentation for
StringTokenizer can lead you tobelieve that it is capable of recognizing only a single delimiter at a time. But if read it carefully, you will find that
StringTokenizer can recognize any number of delimiters. The delimiter argument of the
StringTokenizer constructoris a string whose every character is interpreted as a delimiter. The string as a whole is not the delimiter, but rather, its constituent characters are each a delimiter. For example, to use spaces, commas, and colons as delimiters, you would create a
StringTokenizer tokenizer = new StringTokenizer(input, " ,:");
StringTokenizer to parse a file is generally not efficient if you read the file a line at a time because you have to create a new tokenizer for each line. In addition, the parsing ability of
StringTokenizer is minimal. Imagine that you wanted to use a multicharacter delimiter; you can’t do this with
StringTokenizer. For more complicated tokenization, you may want to look into a regular expression library or a lexergenerator.