devxlogo

StringTokenizer: Multiple Delimiter Characters

StringTokenizer: Multiple Delimiter Characters

Question:
I am writing a text search applet that searches through a text file loaded from the server. I am using StringTokenizer to isolate the individual words. The problem is that the words in the file are not necessarily separated by spaces or just one specific delimiter. How do I make StringTokenizer ignore extra characters such as quotes and just tokenize the single word?

Answer:
The documentation for StringTokenizer can lead you tobelieve that it is capable of recognizing only a single delimiter at a time. But if read it carefully, you will find thatStringTokenizer can recognize any number of delimiters. The delimiter argument of the StringTokenizer constructoris a string whose every character is interpreted as a delimiter. The string as a whole is not the delimiter, but rather, its constituent characters are each a delimiter. For example, to use spaces, commas, and colons as delimiters, you would create a StringTokenizer with:

StringTokenizer tokenizer = new StringTokenizer(input, " ,:");

Using StringTokenizer to parse a file is generally not efficient if you read the file a line at a time because you have to create a new tokenizer for each line. In addition, the parsing ability of StringTokenizer is minimal. Imagine that you wanted to use a multicharacter delimiter; you can’t do this with StringTokenizer. For more complicated tokenization, you may want to look into a regular expression library or a lexergenerator.

devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist