Most programmers use the String.split(String) method to convert a String to a String array specifying a delimiter. However, I feel it’s unsafe to rely on the split() method in some cases, because it doesn’t always work properly. For example, sometimes after calling split() the first array index holds a space character even though the string contains no leading space. Here’s an example where split() fails:
public class StringTest { public static void main(String[] args) { final String SPLIT_STR = "^"; final String mainStr = "Token-1^Token-2^Token-3"; final String[] splitStr = mainStr.split(SPLIT_STR); System.out.println("First Index Of ^ : " + mainStr.indexOf(SPLIT_STR)); for(int index=0; index < splitStr.length; index++) { System.out.println("Split : " + splitStr[index]); } }}
This program outputs:
First Index Of ^ : 7 Split : Token-1^Token-2^Token-3
But the expected output would be:
First Index Of ^ : 7 Split : Token-1 Split : Token-2 Split : Token-3
In this case, the split doesn't work because the caret character delimiter needs to be escaped. The workaround in this case is to declare SPLIT_STR = "\^". With that change, the output matches the expected output.
A safer way to split the string would be by using the StringTokenizer API. Here's an example:
import java.util.StringTokenizer;public class StringTest { public static void main(String[] args) { final String SPLIT_STR = "^"; final String mainStr = "Token-1^Token-2^Token-3"; final StringTokenizer stToken = new StringTokenizer( mainStr, SPLIT_STR); final String[] splitStr = new String[stToken.countTokens()]; int index = 0; while(stToken.hasMoreElements()) { splitStr[index++] = stToken.nextToken(); } for(index=0; index < splitStr.length; index++) { System.out.println("Tokenizer : " + splitStr[index]); } }}
The output of the preceding program is:
Tokenizer : Token-1Tokenizer : Token-2Tokenizer : Token-3