devxlogo

Using String.split(String) vs. Using a StringTokenizer

Using String.split(String) vs. Using a StringTokenizer

Most programmers use the String.split(String) method to convert a String to a String array specifying a delimiter. However, I feel it’s unsafe to rely on the split() method in some cases, because it doesn’t always work properly. For example, sometimes after calling split() the first array index holds a space character even though the string contains no leading space. Here’s an example where split() fails:

public class StringTest {   public static void main(String[] args) {      final String SPLIT_STR = "^";      final String mainStr = "Token-1^Token-2^Token-3";      final String[] splitStr = mainStr.split(SPLIT_STR);      System.out.println("First Index Of ^ : " +          mainStr.indexOf(SPLIT_STR));      for(int index=0; index < splitStr.length; index++) {         System.out.println("Split : " + splitStr[index]);      }   }}

This program outputs:

  First Index Of ^ : 7  Split : Token-1^Token-2^Token-3

But the expected output would be:

  First Index Of ^ : 7  Split : Token-1  Split : Token-2  Split : Token-3

In this case, the split doesn't work because the caret character delimiter needs to be escaped. The workaround in this case is to declare SPLIT_STR = "\^". With that change, the output matches the expected output.

A safer way to split the string would be by using the StringTokenizer API. Here's an example:

import java.util.StringTokenizer;public class StringTest {   public static void main(String[] args) {      final String SPLIT_STR = "^";      final String mainStr = "Token-1^Token-2^Token-3";      final StringTokenizer stToken = new StringTokenizer(         mainStr, SPLIT_STR);      final String[] splitStr = new String[stToken.countTokens()];      int index = 0;      while(stToken.hasMoreElements()) {         splitStr[index++] = stToken.nextToken();      }      for(index=0; index < splitStr.length; index++) {         System.out.println("Tokenizer : " + splitStr[index]);      }   }}

The output of the preceding program is:

Tokenizer : Token-1Tokenizer : Token-2Tokenizer : Token-3
See also  Professionalism Starts in Your Inbox: Keys to Presenting Your Best Self in Email
devxblackblue

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist