A quite common concept in many programming langiages, though perhaps strongest in Java, is the String Tokenizer. This is a utility class or function that takes a String as input, and optionally one or more delimiters, and breaks up the string in individual elements, returning those as an appropriate collection. As an example: a String Tokenizer could split the string "red,blue,yellow" into a collection of the strings red, blue and yellow – if the comma had been specified as delimiter or no delimiter had been defined and the comma was the default. With the "e" as delimiter, the String Tokenizer would have produced: "r", "d,blu",",y" and "llow".

PL/SQL has something akin to the String Tokenizer, in the DBMS_UTILITY.COMMA_TO_TABLE function. By the way, my colleague Alex discussed this function in quite a bit of detail in his recent article Never Assume, Always Test

While preparing for our session "SQL Puzzles" (next Tuesday, 13th February) I needed a String Tokenizer in SQL, ideally nothing but SQL. Again, with a different objective in mind, I wrote a SQL Query that indeed acts as a Tokenizer. I am quite sure it can be done much better – Anton: please share with me your thoughts on this – but I will share my attempt with you anyway.....

The query is this one:

with numbers as(  select rownum rn   from   ( select count(*) rn            from   dual            group            by     cube(1,2,3,4)          )), input as( select 'red,blue,yellow' string_to_tokenize  ,      'e' delimiter  from   dual), positions as  (  select rn pos   from   numbers          join          input   on     instr(string_to_tokenize, delimiter, 1, rn) > 0   union   select 0 pos   from   input   where  instr(string_to_tokenize, delimiter) > 0)select  substr( string_to_tokenize        , 1 + case pos when 0 then 0 else instr(string_to_tokenize, delimiter, 1, pos) end        , case instr(string_to_tokenize, delimiter, 1, pos+1) when 0 then length(string_to_tokenize)+2 else instr(string_to_tokenize, delimiter, 1, pos+1)+1 end          - ( 2 + case pos when 0 then 0 else instr(string_to_tokenize, delimiter, 1, pos) end)) tokenfrom     positions,        input 

When I run the query on , the results are:

TOKEN------------------redblueyellow 

And when I use "e" as the delimiter, the results are:

TOKEN-------------rd,blu,yllow

 

Note that the inline-view numbers is somewhat limited as right now it will return no more than 16 rows. There are of course several ways to increase or tune that number, one would be to have numbers adapt to the inline-view input, that should probably be defined first in the query.

Also note that this query could be run using SQL*Plus parameters, like this:

SQL> with numbers as  2  (  select rownum rn  3     from   ( select count(*) rn  4              from   dual  5              group  6              by     cube(1,2,3,4)  7            )  8  )  9  , input as 10  ( select '&String_To_Tokenizer' string_to_tokenize 11    ,      '&Delimiter' delimiter 12    from   dual 13  ) 14  , positions as 15    (  select rn pos 16     from   numbers 17            join 18            input 19     on     instr(string_to_tokenize, delimiter, 1, rn) > 0 20     union 21     select 0 pos 22     from   input 23     where  instr(string_to_tokenize, delimiter) > 0 24  ) 25  select  substr( string_to_tokenize 26          , 1 + case pos when 0 then 0 else instr(string_to_tokenize, delimiter, 1, pos) end 27          , case instr(string_to_tokenize, delimiter, 1, pos+1) when 0 then length(string_to_tokenize)+2 else instr(string_to_tokenize, delimiter, 1, pos+1)+1 end 28            - ( 2 + case pos when 0 then 0 else instr(string_to_tokenize, delimiter, 1, pos) end)) token 29  from     positions 30  ,        input 31  /Enter value for string_to_tokenizer: in the land of the pigs, the butcher is king (by Meat Loaf)old  10: ( select '&String_To_Tokenizer' string_to_tokenizenew  10: ( select 'in the land of the pigs, the butcher is king (by Meat Loaf)' string_to_tokenizeEnter value for delimiter:  old  11:   ,      '&Delimiter' delimiternew  11:   ,      ' ' delimiter

TOKEN--------------------------------------------------------------------------------inthelandofthepigs,thebutcherisking(by

TOKEN--------------------------------------------------------------------------------MeatLoaf)

13 rows selected. 

Finally, the query should be extended to support multiple delimiters.