On our blog, we have been discussing the new hierarchical query functionality in Oracle Database 11g Release 2, using Recursive Suquery Factoring. Instead of using CONNECT BY and its close associates such as START WITH, PRIOR, LEVEL and more exotic comrades like SYS_CONNECT_BY_PATH, CONNECT_BY_ROOT and NOCYCLE this release gave us a new, less proprietary and eventually probably more intuitive and functionally rich approach. We have also written how though we have no straightforward alternatives for LEVEL, SYS_CONNECT_BY_PATH and CONNECT_BY_ROOT – in the new recursive approach they are fantastically easy to emulate.
Until recently I have been quite happy with the new hierarchical querying and telling the world how I felt. Then an esteemed colleague – a far more experienced SQL programmer than I am – came up to me and remined me how the recursive sub query syntax at the present does not have a replacement for the CONNECT_BY_ISLEAF function – the SQL function that tells us whether a node produced in an hierarchical query has any children or is at the bottom of the chain – i.e. a leaf node. For leaf nodes (child-less), the function returns a value of one and for parent nodes the value is zero.
Anton (my colleague) was right and unfortunately I did not have a quick retort. However, after giving it some thought I believe I have found a way of emulating the CONNECT_BY_ISLEAF as well, using the new DEPTH FIRST ordering capabilities of the recursive subquery. I hope this will satisfy Anton as well.
First a quick example of using CONNECT_BY_ISLEAF to illustrate what I am trying to regain. This next query brings back the organizational hierarchy from table EMP, starting with employees without a manager and following the (hierarchical) chain of command downward by linking EMPNO (manager) to MGR (in subordinate):
SELECT empno empid , ename name , mgr mgrid , CONNECT_BY_ISLEAF leaf_node FROM emp CONNECT BY PRIOR empno = mgr start with mgr is null
The CONNECT_BY_ISLEAF function indicates which employees are at the bottom of the pyramid.
and the result
Now the recursive subquery does not use the CONNECT BY clause. Instead it has a two-part subquery in the WITH clause of the query; both parts are combined through a UNION ALL . The first part selects the root-node(s) for the tree. The second part recursively joins with the subquery itself, to iteratively find the children for the previously selected round of nodes (in the first iteration that round contains the root-nodes produced by the first query, in later rounds these (candidate) parent nodes are returned by the second query. For example:
WITH employees (empid, name, mgrid, lev) AS ( SELECT empno, ename, mgr , 1 FROM emp WHERE mgr is null UNION ALL SELECT empno, ename, mgr, lev+1 FROM employees r, emp e WHERE r.empid = e.mgr ) select e.empid , e.name , e.mgrid from employees e /
My reasoning for the alternative solution with the is based on the SEARCH DEPTH FIRST option we have for specifying a sequence number to the result of the recursive query. This way of determining the order of the nodes is based on first going down the tree (root, first child of root, first grandchild of root, up again – down again to second grandchild of root, up again, up again, second child of root, down again …… ) instead of first visiting the root nodes, then the first level of children etc. Something like
1 2 5 8 3 4 6 7 9 10
instead of
1 2 3 4 5 6 7 8 9 10
Now I believe we can make use of this specific way of searching to locate the leafs. Every leaf node does not have a child (duh!!) and therefore the next node in the ‘depth first’ sequence HAS TO BE AT THE SAME LEVEL OR HIGHER. By looking at the ‘next node’s hierarchical level when the nodes are ordered by the depth-first-sequence-number; and checking whether that level is either lower (no leaf) or the same/higher (IS LEAF!) than the node under scrutiny, we can find the leaf nodes.
The recursive query will return all nodes and will this depth-first-sequence number. A second query can then use the analytical LEAD function to find the sequence number assigned to the next node:
lead(lev) over (order by seq))
and compare it with the current node, to declare either is leaf (1) or is no leaf (0).
case when (lev - lead(lev) over (order by seq)) < 0 then 0 else 1 end is_leaf
The complete query now would be something like this:
WITH each_level (empid, name, mgrid, lev) AS ( SELECT empno, ename, mgr , 1 FROM emp WHERE mgr is null UNION ALL SELECT empno, ename, mgr, lev+1 FROM each_level r, emp e WHERE r.empid = e.mgr ) SEARCH DEPTH FIRST BY mgrid SET seq select e.* , case when (lev - lead(lev) over (order by seq)) < 0 then 0 else 1 end is_leaf from each_level e /
The result is as follows:
I guess what you’re looking for in first place could be achieved more securely with a scalar subquery like so:
with hierarchie (lvl, ename, empno, is_leaf) as (
select 1 lvl, ename, empno,
(select decode(count(*), 0, 1, 0)
from emp
where mgr = m.empno) is_leaf
from emp m
where mgr is null
union all
select h.lvl + 1, e.ename, e.empno,
(select decode(count(*), 0, 1, 0)
from emp
where mgr = e.empno) is_leaf
from hierarchie h, emp e
where h.empno = e.mgr)
search depth first by ename set sort_seq
select lvl, ename, is_leaf
from hierarchy;
To my understanding, this is somewhat easier to use. Whether this on the other hand is still performant is another question.
Best regards,
Jürgen
Hi there.
Very nice post 🙂
How would I get the is_leaf if the ordering is not by depth, but by breadth ?
Regards
Mette
Is what you are asking for just the root node for every leaf-node – it seems that way at least. Of course it is easy to add the root to every node returned from the query and then filter on is_leaf = 1. Something like:
WITH
each_level (empid, name, mgrid, lev, root) AS
(
SELECT empno, ename, mgr , 1, ename
FROM emp
WHERE mgr is null
UNION ALL
SELECT empno, ename, mgr, lev+1, root
FROM each_level r, emp e
WHERE r.empid = e.mgr
)
SEARCH DEPTH FIRST BY mgrid SET seq
, leafNodes as
( select e.*
from each_level e
where case
when (lev – lead(lev) over (order by seq)) < 0
then 0
else 1
end = 1
)
select ename
, root top_manager
from leafNodes
Or are you looking for something else?
Of course I’m satisfied! But can you use it to find the topmanager from every employee? Something like:
select emp.empid
, emp.name
, ( select mgr.name
from emp mgr
where connect_by_isleaf = 1
connect by mgr.empid = prior mgr.mgr
start with mgr.empid = emp.mgr
) top_manager
from emp emp
Hi Lucas,
Brilliant idea! I just finished a session telling my audience there is no alternative for connect_by_isleaf. I’ll have to revise that statement …
By the way, cycle detection is still different from the connect-by syntax. Maybe you can even call it wrong but that’s subjective. And did you compare execution plans and performance? If you do, you’ll stick with connect-by. At least for some cases.
Regards,
Rob.