Thursday, 7 April 2011

Comparing StringTokenizer and String.split(...)

People often say that String.split(...) and the older StringTokenizer are interchangeable and that these days you should use Sting.split(...) in preference to StringTokenizer. But, are they as interchangeable as they’re made out to be or are there some input conditions where there results different? The code below tests this theory using four input strings:

  1. "A,B,C,D,E" A normal or fully populated set of characters.
  2. "A,B,,,E" A String with characters missing from it’s middle.
  3. "A,B,C,,," A String with characters missing from the end.
  4. ",,,D,E" A String with characters missing from the beginning.

Running these through the code below we get the following results:

Split using: "," character.
 
SPLIT Normal String Input String: A,B,C,D,E
Split Array Size is = 5
Split Data: A : B : C : D : E : 
String Tokenized Array Size is = 5
StringTokenized Array Data: A : B : C : D : E : 

SPLIT String with Missing Values Input String: A,B,,,E
Split Array Size is = 5
Split Data: A : B : {blank} : {blank} : E : 
String Tokenized Array Size is = 5
StringTokenized Array Data: A : B : E : null : null : 

SPLIT String with Missing End Values Input String: A,B,C,,,
Split Array Size is = 3
Split Data: A : B : C : 
String Tokenized Array Size is = 5
StringTokenized Array Data: A : B : C : null : null : 

SPLIT Missing Start Values Input String: ,,,D,E
Split Array Size is = 5
Split Data: {blank} : {blank} : {blank} : D : E : 
String Tokenized Array Size is = 5
StringTokenized Array Data: D : E : null : null : null : 

From this, you can see that the first input string A,B,C,D produces the same results in both cases; however, when there are characters missing from the input - blank spaces - then the results differ and these differences could affect your code should you switch between split(...) and StringTokenizer.

public class Split {

 
/**
   *
@param args
   */
 
public static void main(String[] args) {

   
String splitChar = ",";

    split
("A,B,C,D,E", splitChar, "SPLIT Normal String");

    split
("A,B,,,E", splitChar, "SPLIT String with Missing Values");

    split
("A,B,C,,,", splitChar, "SPLIT String with Missing End Values");

    split
(",,,D,E", splitChar, "SPLIT Missing Start Values");
 
}

 
private static void split(String num1, String splitChar, String msg) {

   
String[] result = num1.split(splitChar);
    String
[] result2 = stringTokenizedToList(num1, splitChar);
    print
(msg, num1, result, result2);
 
}

 
private static String[] stringTokenizedToList(String val, String splitChar) {

   
String[] result = new String[5]; // 5 is the expected size
   
StringTokenizer st = new StringTokenizer(val, splitChar);
   
int i = 0;
   
while (st.hasMoreTokens()) {
     
String str = st.nextToken();
      result
[i++] = str;
   
}

   
return result;
 
}

 
private static void print(String msg, String input, String[] splitResult,
      String
[] tokenizedResult) {

   
System.out.println(msg + " Input String: " + input);

    printSplitResult
(splitResult);
    printTokenisedAsList
(tokenizedResult);
    System.out.print
("\n");
 
}

 
private static void printSplitResult(String[] splitResult) {

   
System.out.println("Split Array Size is = " + splitResult.length);
    System.out.print
("Split Data: ");
   
for (String str : splitResult) {
     
str = checkStr(str);
      System.out.print
(str + " : ");
   
}
   
System.out.print("\n");
 
}

 
private static String checkStr(String str) {

   
if (isNull(str))
     
str = "null";
   
else if (isEmptyString(str))
     
str = "{blank}";
   
return str;
 
}

 
private static void printTokenisedAsList(String[] tokenizedResult) {

   
System.out.println("String Tokenized Array Size is = " + tokenizedResult.length);
    System.out.print
("StringTokenized Array Data: ");
   
for (String str : tokenizedResult) {
     
str = checkStr(str);
      System.out.print
(str + " : ");
   
}
   
System.out.print("\n");
 
}

 
private static boolean isNull(Object obj) {
   
return obj == null;
 
}

 
private static boolean isEmptyString(String str) {
   
return str.equals("");
 
}

}

No comments: