An Oracle10g PL/SQL implementation of the AWK split() built-in

Jan Jansen

With the regular expression functionality in Orcale database 10g some handy build-ins of the UNIX pattern matching language AWK are easily implemented. I myself use AWK split() function quite often in AWK scripts. AWK split() splits a given string into chunks separated by substrings which are described by a regular expression parameter. The substrings are returned in hash table (an associative array like datatype). The function returns the number of chunks collected.

A typical split call could be something like  t = split(‘oracle123@mysql’,tab,’^a-z +’) putting  ‘oracle’ and ‘mysql’ in the hash table tab and returning t = 2.....

The following piece of code is for my own little recursive PL/SQL split procedure with some added functionality. The first 3 parameters do what their AWK counterparts do. The parameter "mode" if set to < 0 makes split collect substrings described by regexp into tab rather than filtering out undesirable ones. "ml" tells split the length a chunk should have in order te be potentially interesting.

I wrote jsplit() (my version of split as a procedure, instead of as a function, mainly because the number of chunks returned can easily be queried using the ".count" method of the tab variable if necessary.

create or replace type chartab_t as table of varchar2(4000);

The procedure itself:

CREATE OR REPLACE
procedure jsplit
(str IN varchar2,tab IN OUT chartab_t,fs varchar2,md IN number default 1, ml IN number default 2)  is
  t       number;
  v_sub   varchar2(4000);
  v_str   varchar2(4000) := str;
begin
  if str is null then
    return;
  end if;
  t := regexp_instr(v_str,fs);
  if md > 0 THEN –-  AWK split(); t == first index fieldsep;
    case t
      when 0 then — no fieldsep
        if (length(v_str) >= ml) then
          tab.extend;
          tab(tab.last) := v_str;
          v_str := ‘’;
          jsplit(v_str,tab,fs,md,ml);
        else
          v_str := ‘’;
          jsplit(v_str,tab,fs,md,ml);
        end if;
      when 1 then –
        v_str := regexp_replace(v_str,fs,'’,1,1,’i');
        jsplit(v_str,tab,fs,md,ml);
      else — t > 1
        v_sub := substr(v_str,1,t-1);
        if (length(v_sub) >= ml) then
          tab.extend;
          tab(tab.last) := v_sub;
        end if;
          v_str := substr(v_str,t);
          jsplit(v_str,tab,fs,md,ml);
    end case;
  else –  mode < 0
    case t
      when 0 then
        return;
      when 1 then
        v_sub := regexp_substr(v_str,fs);
        if length(v_sub) >= ml then
          tab.extend;
          tab(tab.last) := v_sub;
        end if;
        v_str := regexp_replace(v_str,fs,'’,1,1,’i');
        jsplit(v_str,tab,fs,md,ml);
      else — t > 1
        v_str := substr(v_str,t);
        jsplit(v_str,tab,fs,md,ml);
    end case;
  end if;
end;
 
Next Post

ODTUG 2006 Conference Handouts/Presentations available

Facebook0TwitterLinkedinLast night, the ODTUG 2006 Conference Committee announced that all handouts and presentations from this year’s (well, in fact that is last week’s) conference are available for download from the ODTUG website at http://www.odtug.com. Until August 1st, all material can be downloaded by the general public. After that date, you […]