An Oracle10g PL/SQL implementation of the AWK split() built-in
With the regular expression functionality in Orcale database 10g some handy build-ins of the UNIX pattern matching language AWK are easily implemented. I myself use AWK split() function quite often in AWK scripts. AWK split() splits a given string into chunks separated by substrings which are described by a regular expression parameter. The substrings are returned in hash table (an associative array like datatype). The function returns the number of chunks collected.
A typical split call could be something like t = split(‘oracle123@mysql’,tab,’^a-z +’) putting ‘oracle’ and ‘mysql’ in the hash table tab and returning t = 2.
The following piece of code is for my own little recursive PL/SQL split procedure with some added functionality. The first 3 parameters do what their AWK counterparts do. The parameter "mode" if set to < 0 makes split collect substrings described by regexp into tab rather than filtering out undesirable ones. "ml" tells split the length a chunk should have in order te be potentially interesting.
I wrote jsplit() (my version of split
as a procedure, instead of as a function, mainly because the number of chunks returned can easily be queried using the ".count" method of the tab variable if necessary.
create or replace type chartab_t as table of varchar2(4000);
The procedure itself:
CREATE OR REPLACE<br />procedure jsplit<br />(str IN varchar2,tab IN OUT chartab_t,fs varchar2,md IN number default 1, ml IN number default 2) is<br /> t number;<br /> v_sub varchar2(4000);<br /> v_str varchar2(4000) := str;<br />begin<br /> if str is null then<br /> return;<br /> end if;<br /> t := regexp_instr(v_str,fs);<br /> if md > 0 THEN –- AWK split(); t == first index fieldsep; <br /> case t<br /> when 0 then — no fieldsep<br /> if (length(v_str) >= ml) then<br /> tab.extend;<br /> tab(tab.last) := v_str;<br /> v_str := ‘’;<br /> jsplit(v_str,tab,fs,md,ml);<br /> else<br /> v_str := ‘’;<br /> jsplit(v_str,tab,fs,md,ml);<br /> end if;<br /> when 1 then –<br /> v_str := regexp_replace(v_str,fs,'’,1,1,’i');<br /> jsplit(v_str,tab,fs,md,ml);<br /> else — t > 1<br /> v_sub := substr(v_str,1,t-1);<br /> if (length(v_sub) >= ml) then<br /> tab.extend;<br /> tab(tab.last) := v_sub;<br /> end if;<br /> v_str := substr(v_str,t);<br /> jsplit(v_str,tab,fs,md,ml);<br /> end case;<br /> else – mode < 0<br /> case t<br /> when 0 then<br /> return;<br /> when 1 then<br /> v_sub := regexp_substr(v_str,fs);<br /> if length(v_sub) >= ml then<br /> tab.extend;<br /> tab(tab.last) := v_sub;<br /> end if;<br /> v_str := regexp_replace(v_str,fs,'’,1,1,’i');<br /> jsplit(v_str,tab,fs,md,ml);<br /> else — t > 1<br /> v_str := substr(v_str,t);<br /> jsplit(v_str,tab,fs,md,ml);<br /> end case;<br /> end if;<br />end;
Recent Comments