With the regular expression functionality in Orcale database 10g some handy build-ins of the UNIX pattern matching language AWK are easily implemented. I myself use AWK split() function quite often in AWK scripts. AWK split() splits a given string into chunks separated by substrings which are described by a regular expression parameter. The substrings are returned in hash table (an associative array like datatype). The function returns the number of chunks collected.
A typical split call could be something like t = split(‘oracle123@mysql’,tab,’^a-z +’) putting ‘oracle’ and ‘mysql’ in the hash table tab and returning t = 2.
The following piece of code is for my own little recursive PL/SQL split procedure with some added functionality. The first 3 parameters do what their AWK counterparts do. The parameter "mode" if set to < 0 makes split collect substrings described by regexp into tab rather than filtering out undesirable ones. "ml" tells split the length a chunk should have in order te be potentially interesting.
I wrote jsplit() (my version of split as a procedure, instead of as a function, mainly because the number of chunks returned can easily be queried using the ".count" method of the tab variable if necessary.
create or replace type chartab_t as table of varchar2(4000);
The procedure itself:
CREATE OR REPLACE
procedure jsplit
(str IN varchar2,tab IN OUT chartab_t,fs varchar2,md IN number default 1, ml IN number default 2) is
t number;
v_sub varchar2(4000);
v_str varchar2(4000) := str;
begin
if str is null then
return;
end if;
t := regexp_instr(v_str,fs);
if md > 0 THEN –- AWK split(); t == first index fieldsep;
case t
when 0 then — no fieldsep
if (length(v_str) >= ml) then
tab.extend;
tab(tab.last) := v_str;
v_str := ‘’;
jsplit(v_str,tab,fs,md,ml);
else
v_str := ‘’;
jsplit(v_str,tab,fs,md,ml);
end if;
when 1 then –
v_str := regexp_replace(v_str,fs,'’,1,1,’i');
jsplit(v_str,tab,fs,md,ml);
else — t > 1
v_sub := substr(v_str,1,t-1);
if (length(v_sub) >= ml) then
tab.extend;
tab(tab.last) := v_sub;
end if;
v_str := substr(v_str,t);
jsplit(v_str,tab,fs,md,ml);
end case;
else – mode < 0
case t
when 0 then
return;
when 1 then
v_sub := regexp_substr(v_str,fs);
if length(v_sub) >= ml then
tab.extend;
tab(tab.last) := v_sub;
end if;
v_str := regexp_replace(v_str,fs,'’,1,1,’i');
jsplit(v_str,tab,fs,md,ml);
else — t > 1
v_str := substr(v_str,t);
jsplit(v_str,tab,fs,md,ml);
end case;
end if;
end;