An Oracle10g PL/SQL implementation of the AWK split() built-in

With the regular expression functionality in Orcale database 10g some handy build-ins of the UNIX pattern matching language AWK are easily implemented. I myself use AWK split() function quite often in AWK scripts. AWK split() splits a given string into chunks separated by substrings which are described by a regular expression parameter. The substrings are returned in hash table (an associative array like datatype). The function returns the number of chunks collected.

A typical split call could be something like  t = split(‘oracle123@mysql’,tab,’^a-z +’) putting  ‘oracle’ and ‘mysql’ in the hash table tab and returning t = 2.....

The following piece of code is for my own little recursive PL/SQL split procedure with some added functionality. The first 3 parameters do what their AWK counterparts do. The parameter "mode" if set to < 0 makes split collect substrings described by regexp into tab rather than filtering out undesirable ones. "ml" tells split the length a chunk should have in order te be potentially interesting.

I wrote jsplit() (my version of split An Oracle10g PL/SQL implementation of the AWK split() built-in smile as a procedure, instead of as a function, mainly because the number of chunks returned can easily be queried using the ".count" method of the tab variable if necessary.

create or replace type chartab_t as table of varchar2(4000);

The procedure itself:

CREATE OR REPLACE
procedure jsplit
(str IN varchar2,tab IN OUT chartab_t,fs varchar2,md IN number default 1, ml IN number default 2)  is
  t       number;
  v_sub   varchar2(4000);
  v_str   varchar2(4000) := str;
begin
  if str is null then
    return;
  end if;
  t := regexp_instr(v_str,fs);
  if md > 0 THEN –-  AWK split(); t == first index fieldsep;
    case t
      when 0 then — no fieldsep
        if (length(v_str) >= ml) then
          tab.extend;
          tab(tab.last) := v_str;
          v_str := ‘’;
          jsplit(v_str,tab,fs,md,ml);
        else
          v_str := ‘’;
          jsplit(v_str,tab,fs,md,ml);
        end if;
      when 1 then –
        v_str := regexp_replace(v_str,fs,'’,1,1,’i');
        jsplit(v_str,tab,fs,md,ml);
      else — t > 1
        v_sub := substr(v_str,1,t-1);
        if (length(v_sub) >= ml) then
          tab.extend;
          tab(tab.last) := v_sub;
        end if;
          v_str := substr(v_str,t);
          jsplit(v_str,tab,fs,md,ml);
    end case;
  else –  mode < 0
    case t
      when 0 then
        return;
      when 1 then
        v_sub := regexp_substr(v_str,fs);
        if length(v_sub) >= ml then
          tab.extend;
          tab(tab.last) := v_sub;
        end if;
        v_str := regexp_replace(v_str,fs,'’,1,1,’i');
        jsplit(v_str,tab,fs,md,ml);
      else — t > 1
        v_str := substr(v_str,t);
        jsplit(v_str,tab,fs,md,ml);
    end case;
  end if;
end;