On our blog, we have been discussing the new hierarchical query functionality in Oracle Database 11g Release 2, using Recursive Suquery Factoring. Instead of using CONNECT BY and its close associates such as START WITH, PRIOR, LEVEL and more exotic comrades like SYS_CONNECT_BY_PATH, CONNECT_BY_ROOT and NOCYCLE this release gave us a new, less proprietary and eventually probably more intuitive and functionally rich approach. We have also written how though we have no straightforward alternatives for LEVEL, SYS_CONNECT_BY_PATH and CONNECT_BY_ROOT – in the new recursive approach they are fantastically easy to emulate.
Until recently I have been quite happy with the new hierarchical querying and telling the world how I felt. Then an esteemed colleague – a far more experienced SQL programmer than I am – came up to me and remined me how the recursive sub query syntax at the present does not have a replacement for the CONNECT_BY_ISLEAF function – the SQL function that tells us whether a node produced in an hierarchical query has any children or is at the bottom of the chain – i.e. a leaf node. For leaf nodes (child-less), the function returns a value of one and for parent nodes the value is zero.
Anton (my colleague) was right and unfortunately I did not have a quick retort. However, after giving it some thought I believe I have found a way of emulating the CONNECT_BY_ISLEAF as well, using the new DEPTH FIRST ordering capabilities of the recursive subquery. I hope this will satisfy Anton as well.
First a quick example of using CONNECT_BY_ISLEAF to illustrate what I am trying to regain. This next query brings back the organizational hierarchy from table EMP, starting with employees without a manager and following the (hierarchical) chain of command downward by linking EMPNO (manager) to MGR (in subordinate):
SELECT empno empid , ename name , mgr mgrid , CONNECT_BY_ISLEAF leaf_node FROM emp CONNECT BY PRIOR empno = mgr start with mgr is null
The CONNECT_BY_ISLEAF function indicates which employees are at the bottom of the pyramid.
and the result
Now the recursive subquery does not use the CONNECT BY clause. Instead it has a two-part subquery in the WITH clause of the query; both parts are combined through a UNION ALL . The first part selects the root-node(s) for the tree. The second part recursively joins with the subquery itself, to iteratively find the children for the previously selected round of nodes (in the first iteration that round contains the root-nodes produced by the first query, in later rounds these (candidate) parent nodes are returned by the second query. For example:
WITH employees (empid, name, mgrid, lev) AS ( SELECT empno, ename, mgr , 1 FROM emp WHERE mgr is null UNION ALL SELECT empno, ename, mgr, lev+1 FROM employees r, emp e WHERE r.empid = e.mgr ) select e.empid , e.name , e.mgrid from employees e /
My reasoning for the alternative solution with the is based on the SEARCH DEPTH FIRST option we have for specifying a sequence number to the result of the recursive query. This way of determining the order of the nodes is based on first going down the tree (root, first child of root, first grandchild of root, up again – down again to second grandchild of root, up again, up again, second child of root, down again …… ) instead of first visiting the root nodes, then the first level of children etc. Something like
1 2 5 8 3 4 6 7 9 10
1 2 3 4 5 6 7 8 9 10
Now I believe we can make use of this specific way of searching to locate the leafs. Every leaf node does not have a child (duh!!) and therefore the next node in the ‘depth first’ sequence HAS TO BE AT THE SAME LEVEL OR HIGHER. By looking at the ‘next node’s hierarchical level when the nodes are ordered by the depth-first-sequence-number; and checking whether that level is either lower (no leaf) or the same/higher (IS LEAF!) than the node under scrutiny, we can find the leaf nodes.
The recursive query will return all nodes and will this depth-first-sequence number. A second query can then use the analytical LEAD function to find the sequence number assigned to the next node:
lead(lev) over (order by seq))
and compare it with the current node, to declare either is leaf (1) or is no leaf (0).
case when (lev - lead(lev) over (order by seq)) < 0 then 0 else 1 end is_leaf
The complete query now would be something like this:
WITH each_level (empid, name, mgrid, lev) AS ( SELECT empno, ename, mgr , 1 FROM emp WHERE mgr is null UNION ALL SELECT empno, ename, mgr, lev+1 FROM each_level r, emp e WHERE r.empid = e.mgr ) SEARCH DEPTH FIRST BY mgrid SET seq select e.* , case when (lev - lead(lev) over (order by seq)) < 0 then 0 else 1 end is_leaf from each_level e /
The result is as follows: