Using an aggregation function to query a JSON-string straight from SQL

Last week I read this blogpost by Scott Wesley. In this post he describes that he uses a custom aggregate function to create large JSON-strings.
And for that he used a solution as described in this post by Carsten Czarski. That post of Scott reminded me of a post by my collegue Lucas Jellema, in which he uses the “normal” listagg aggregation function. When Lucas wrote his post I thought that I could beat the 4000 char limit of his aproach with a custom aggregate function.

I started out with “Tom Kytes stragg_type”, see here, just changed the type of the stragg_type attribute string to clob and the return-type of the functions to clob.

That worked, no problem to aggregate strings of size 4000, no problem for size 10000, but for larger strings it became slower and slower.
Too slow, 15000 bytes took 15 seconds.

So I changed the type back to varchar2, but with a size of varchar2(32767).
The worked, and fast. But only for strings shorter than 29989 bytes. For larger strings I would get a
ORA-22813: operand value exceeds system limits

To solve that I added a clob attribute, just as Carsten Czarski does in his listagg.
Used the varchar2 string for speed, and as soon as it result became to large, the clob for size.
And that worked too. But as soon as the aggregated string size exceeded 58894 bytes the ORA-22813 popped up again.
And as soon as the odciaggregatemerge function used the clob another error: ORA-22922: nonexistent LOB value
So I gave up, 4000 bytes is a nice limit for JSON, if you want something bigger you have to use PL/SQL. So I thought.

But after reading the post of Scott I compared my code with the code of Carsten Czarski to see how he solved my problems.
And it turned out that the first one was easy to solve, just limit the string to 4000 again.
And Carsten’s odciaggregatemerge function will raise a ORA-22922 too. I expect that it is an Oracle bug 🙂
But, because the odciaggregatemerge function is only executed if the optimizer decides that it will execute the aggregating query parallel, you aggregate very large strings without ever seeing that error.

So, now it’s time to introduce my JSON aggregator. It’s a custom aggregate function, which aggregates a query into a JSON-array. The elements of this array are JSON-objects.

create or replace type agg_json as object
( t_varchar2 varchar2(32767)
, t_clob clob
, static function odciaggregateinitialize( sctx in out agg_json )
  return number
, member function odciaggregateiterate
    ( self in out agg_json
    , a_val dbmsoutput_linesarray
    )
  return number
, member function odciaggregateterminate
    ( self in out agg_json
    , returnvalue out clob
    , flags in number
    )
  return number
, member function odciaggregatemerge
    ( self in out agg_json
    , ctx2 in out agg_json
    )
  return number
, static function json_obj( p_obj dbmsoutput_linesarray )
  return varchar2
, static function escape_json( p_val varchar2 )
  return varchar2
)

Just a type with two attributes, the standard functions for implementing a custom aggregation function, and two supporting static functions.
But notice the a_val parameter of odciaggregateiterate. dbmsoutput_linesarray is a varray of varchar2(32767).
Every name-value pair in the JSON-Object is formed by 3 entries in that varray.
The first entry is the name of the name-value pair.
The second entry is the value of the name-value pair.
And the third is a indicator for the value, is it a string or not.
The fourth entry is the name of the second name-value pair.
The fifth entry is the value of the second name-value pair.

After creating the aggregation function you can create JSON

create or replace function json_agg( agg dbmsoutput_linesarray )
return clob
parallel_enable aggregate using agg_json;

For example, this query

select json_agg( dbmsoutput_linesarray( 'id', level, 'n'
                                      , 'name', level, '' 
                                      , 'test', 'test' || level, ''
                                      )
               ) 
from dual
connect by level <= 3

produces this JSON

 [{"id":1,"name":"1","test":"test1"}
,{"id":2,"name":"2","test":"test2"}
,{"id":3,"name":"3","test":"test3"}]

And to get the JSON from Lucas example nest two calls to this new aggregation function

select agg_json.json_obj
         ( dbmsoutput_linesarray( 'company'
                                , json_agg( dbmsoutput_linesarray( 'name', d.dname, ''
                                                                 , 'identifier', d.deptno, '' 
                                                                 , 'location', d.loc, '' 
                                                                 , 'employees', json_agg( dbmsoutput_linesarray( 'name', e.ename, ''
                                                                                                               , 'job', e.job, ''
                                                                                                               , 'salary', e.sal, 'n'
                                                                                                               , 'manager', nvl( ( select agg_json.json_obj( dbmsoutput_linesarray( 'name', m.ename, ''
                                                                                                                                                                                  , 'salary', m.sal, 'n'
                                                                                                                                                                                  , 'hiredate', to_char( m.hiredate, 'DD-MM-YYYY' ), ''
                                                                                                                                                                                  )
                                                                                                                                                           )
                                                                                                                                   from emp m
                                                                                                                                   where m.empno = e.mgr
                                                                                                                                 ), '{}' ), 'o'
                                                                                                               , 'hiredate', to_char( e.hiredate, 'DD-MM-YYYY' ), '' 
                                                                                                               ) 
                                                                                        ), 'a'
                                          )
                                          )
                                , 'a'
                                )
         )
from dept d
   , emp e
where d.deptno = e.deptno
group by d.dname, d.deptno, d.loc

Here is the code.

Tags:,

2 Comments

  1. Vikas January 19, 2016
    • Anton Scheffer January 19, 2016