Oracle RDBMS 11gR2 – goodbye Connect By or: the end of hierarchical querying as we know it

22

Many years ago, Oracle basically set the standard in SQL. Whatever was Oracle SQL could be seen as the standard. This has never been absolutely true – ANSI SQL was different from Oracle SQL. In some respects the differences can be traced back to lack of functional richness in the standard. However, in certain areas, Oracle has walked its own path with certain functions and features with the rest of the RDBMS pack following another route. It seems to be as if starting with Oracle RDBMS 9iR2, Oracle has made several important steps towards embracing the ANSI SQL standard syntax, usually while maintaining its own specific flavor of those same functions.

Some examples of Oracle specific syntax that are also available through their ANSI SQL counterpart – that in most instances has even more functionality – and that is almost always the preferred approach going forward:

  • outer join syntax: in addition to Oracle’s (+) notation, 9iR2 introduced the left outer join, right outer join and full outer join (like (+) on both ends of the join condition, something that Oracle does not support)
  • the Decode function that with 9iR2 can (and should) be replace by the CASE expression
  • the to_char function that in many instances can be replace by extract {year|month|day|HOUR | MINUTE | SECOND} from date-value
  • the ltrim and rtrim functions that can be replaced by the trim function with leading and trailing settings

The 11gR2 release has another example of Oracle specific SQL that is now complemented with the ANSI SQL standard approach:the CONNECT BY syntax for hierarchical querying that can now be replaced by the ANSI SQL standard Recursive Subquery Factoring clause, which is an extension to the WITH clause introduced as long ago as Oracle7 and still relatively underused by Oracle SQL developers.

The table I will use for demonstrating this new syntax is EMP. The records in EMP refer via their MGR column to other records in EMP, allowing us to create an organization hierarchy starting with KING – an employee without manager – and traversing the org-chart all the way down.

The classic way of performing the hierarchical query looks like this:

with employees (empno, name, level) as
( select empno, ename, job, mgr, level
  from   emp
  start with mgr is null
  connect by prior empno = mgr
)
select *
from   employees

This syntax has been available – in this very Oracle proprietary way – from the very first releases of the Oracle RDBMS (release 2 or 3). Later releases add some niceties to hierarchical queries, most notably the Oracle RDBMS 9iR2 and 10g releases.

  • order siblings by
  • sys_connect_path
  • connect_by_root, connect_by_is_leaf,
  • nocycle, connect_by_is_cycle

However, ANSI SQL defines an alternative approach to Hierarchical Queries. It is called Recursive Subquery Factoring (or Recursive With Clause). Apparently it is supported (to some extend) in DB2, SQL Server and PostgreSQL. It extends the WITH clause to enable formulation of recursive queries:  “The query in the WITH clause (subquery) can refer to itself”

This Recursive WITH clause is ANSI. SQL The introduction of this syntax in 11gR2 makes Oracle ANSI-SQL-compatible for recursive queries.

So what is the equivalent of the standard hierarchical query shown above in the new syntax:

with employees (empno, name, job, mgr, hierlevel) as
( select empno, ename, job, mgr, 1
  from   emp
  where  mgr is null
  union all
  select e.empno, e.ename, e.job
  ,      e.mgr, m.hierlevel + 1
  from   emp e
         join
         employees m
         on (m.empno = e.mgr)
)
select *
from   employees

It looks a little complex at first. And it is more lines of code – so how can it be better? Well, give it a moment or two. When we look at more complex hierarchical queries and the use of the specials like sys_connect_path and the connect_by_root some advantages of this syntax will start to shine through.

For the moment, let’s disect this query:

The with clause consists of two queries that are combined the UNION ALL. Both queries return the same list of columns – as is required by UNION ALL. The first query is used to retrieve the root-node(s), the starting point(s) of the tree. In this case, this query only returns KING – the employee with MGR is null. The second query will add all other tree nodes to the result set returned by the subquery. This is where the magic lives: the second query refers to the subquery as a whole. The query that returns the other-than-the-root tree nodes joins with the subquery itself. What this is saying is something to the effect of:

  • the second query in the recursive subuquery is recursively processed, in subsequent iterations
  • in every iteration, we enter a deeper level in the tree
  • each iteration uses the the collection of all nodes retrieved by the first subquery (the root nodes) and all previous iterations. that means for example that the first iteration of   select e.empno, e.ename from   emp e cross join employees m on (m.empno = e.mgr) has the root-node KING to join with from the employees set. The second iteration will have the other managers serving under KING to join with.

With this understanding in our heads, could we then come up with the syntax we need for retrieving what we can do with the Oracle goodie SYS_CONNECT_PATH, the path from every tree node all the way up to its root with node values added to a string concattenated together for all nodes in the path?

It is actually surprisingly simple. To return the alternative to SYS_CONNECT_PATH(ename, ‘/’) all we need is the following:

with employees (empno, name, mgr, hierlevel, path) as
( select empno, ename, job, mgr, 1, ename
  from   emp
  where  mgr is null
  union all
  select e.empno, e.ename, e.job
  ,      e.mgr, m.hierlevel + 1
  ,      m.path||'/'||e.ename
  from   emp e
         join
         employees m
         on (m.empno = e.mgr)
)
select *
from   employees

This will select ‘KING’ as path for the first node and then for each node deeper in the tree take the path value from the parent node and add ‘/’||ename to it.

In answer to questions that arose as to how multiple subqueries can be combined with the recursive subquery, I would like to present the next query that makes a few combinations:

with departments as
( select deptno dept_id
  ,      dname  dept_name
  from   dept
  where  loc in ('NEW YORK' ,'DALLAS')
)
, employees (empno, name, job, mgr, hierlevel) as
( select empno, ename, job, mgr, 1
  from   emp
  where  mgr is null
  union all
  select e.empno, e.ename, e.job
  ,      e.mgr, m.hierlevel + 1
  from   emp e
         join
         employees m
         on (m.empno = e.mgr)
         join
         departments
         on (e.deptno = departments.dept_id)
)
, subordinate_numbers as
( select mgr emp_id
  ,      count(*) number_of_subordinates
  from   employees
  group by mgr
)
select *
from   employees e
       left outer join
       subordinate_numbers s
       on (e.empno = s.emp_id)

In later articles we will look at the more advanced aspects of the new recursive subquery factoring syntax and functionality.

Resources

Oracle 11gR2 SQL Language reference – Recursive With Clause

Share.

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director for Fusion Middleware. Consultant, trainer and instructor on diverse areas including Oracle Database (SQL & PLSQL), Service Oriented Architecture, BPM, ADF, Java in various shapes and forms and many other things. Author of the Oracle Press book: Oracle SOA Suite 11g Handbook. Frequent presenter on conferences such as JavaOne, Oracle OpenWorld, ODTUG Kaleidoscope, Devoxx and OBUG. Presenter for Oracle University Celebrity specials.

22 Comments

  1. Just for community update. Issue raised by me in comment #19 has been accepted by Oracle as a defect.
    Oracle has provided a patch for the same for Linux environment.
    Below are the defect details:
    Defect Id: 3-5078219271
    Those who need it can access the patch via https://support.oracle.com/

  2. Hi Lucas
    Some more info. In this query, CONNECT BY PRIOR is used at two places. It’s the second occurrence that is reporting an error. Only difference between the two is that later is part of an INNERT JOIN condition. Is this valid?
    Abhilesh

  3. Hi Lucas/Abhliesh,
    With a small investigation what I found out is there is not problem with the ‘connect by prior’ in the second argument of ‘Inner Join’, but the problem mainly is with the ‘connect by prior’ in the inner join condition. It seems like with the latest oracle version ‘LEVEL, PRIOR, ROWNUM’ are not allowed in join conndition

  4.  
    Hi Lucas,
    Indeed you are right. Query mentioned above does work. However, following query which was working earlier on Oracle 11.2.0.1 doesn’t work on 11.2.0.3. My apologies for generalizing it with my assumption.
    My table structure is as shown below.
    CREATE TABLE APP_BRANCH
    (
    BRANCH_ID varchar2 (50) NOT NULL ,
    PARENT_ID varchar2 (50) DEFAULT ‘ ‘,
    REGISTERED_REVISION varchar2 (50) NOT NULL ,
    NAME nvarchar2 (500) NOT NULL ,
    STACK_LEVEL number (4, 0) NOT NULL,
    CONSTRAINT PK_SPACE PRIMARY KEY (BRANCH_ID)
    );
    CREATE TABLE APP_DOC
    (
    REVISION varchar2 (50) NOT NULL ,
    DOCUMENT_ID varchar2 (50) NOT NULL ,
    NAME nvarchar2 (255) NOT NULL ,
    TYPE_ID varchar2 (50) NOT NULL ,
    TYPE_REVISION varchar2 (50) NOT NULL ,
    PARENT_ID varchar2 (50) DEFAULT ‘ ‘,
    CONTENT CLOB   NULL ,
    STACK_LEVEL number (4, 0) NOT NULL,
    CONSTRAINT PK_DOC PRIMARY KEY (REVISION, DOCUMENT_ID)
    );
    Here goes the query that fails.
    SELECT /*+ FIRST_ROWS(10) PARALLEL (APP_DOC, 4) INDEX(APP_DOC PK_DOC) */ APP_DOC.REVISION, APP_DOC.DOCUMENT_ID, APP_DOC.NAME, APP_DOC.STACK_LEVEL , APP_DOC.CONTENT FROM APP_DOC INNER JOIN (SELECT D2.DOCUMENT_ID , MAX(STACK_LEVEL) AS BLEVEL FROM APP_DOC D2 WHERE D2.REVISION IN (SELECT REGISTERED_REVISION FROM APP_BRANCH START WITH BRANCH_ID = :REVISION CONNECT BY PRIOR PARENT_ID = BRANCH_ID) AND D2.DOCUMENT_ID = :DOCUMENT_ID AND D2.TYPE_ID = :TYPE_ID GROUP BY D2.DOCUMENT_ID) SUB_RESULT ON APP_DOC.REVISION IN(SELECT REGISTERED_REVISION FROM APP_BRANCH START WITH BRANCH_ID = :REVISION CONNECT BY PRIOR PARENT_ID = BRANCH_ID) AND APP_DOC.STACK_LEVEL = SUB_RESULT.BLEVEL AND APP_DOC.DOCUMENT_ID=SUB_RESULT.DOCUMENT_ID
    Reports the same error. Error conveys following
    Cause: LEVEL, PRIOR, or ROWNUM is being specified at illegal location.
    Action: Remove LEVEL, PRIOR, or ROWNUM.
    In above query, only suspect is PRIOR keyword. Concern here is that the same query was working fine with previous Oracle releases but failed for Oracle 11.2.0.3. Why?
    Abhilesh
     

  5. Hi Abhilesh,

     

    I do not think Oracle will ever desupport CONNECT BY PRIOR, do not fear.

     

    Could you describe the exact table APP_BRANCH that you are running this query against? I can see no apparent reason why your query should not work.

    The Recursive Sub Query Equivalent would be something like:

    with branches(id, parent_id, branch_label) as
    ( select branch_id id
    ,      parent_id parent_id
    ,      branch_name branch_label
    from   app_branch
    union all
    select a.branch_id id
    ,      a.parent_id parent_id
    ,      a.branch_name branch_label
    from   app_branch a
    join
    branches b
    on (a.parent_id = b.id)
    )
    select *
    from   branches

    (assuming a table definition such as:

    create table app_branch
    ( parent_id number(5)
    , branch_id number(5) not null primary key
    , branch_name varchar2(100)
    )

    )

     

    Lucas

  6. Hi Lucas,
    Does it mean that CONNECT BY PRIOR will not be supported by next versions of Oracle11g?
    I put this question because I’ve already got following error after upgrading Oracle server from 11.2.0.1 to 11.2.0.3. Till now I could not find any official documentation from Oracle that conveys the same.
    Query is:

    SELECT BRANCH_NAME FROM APP_BRANCH START WITH BRANCH_ID = ? CONNECT BY PRIOR PARENT_ID = BRANCH_ID

    Error: ORA-00976: Specified pseudocolumn or operator not allowed here
    Any pointers or help appreciated. Also, what could be an alternative ANSI query?

    Abhilesh

  7. Hi Cecil New,

    If you take a look at my response to Adamp, that may already give you your answer. Nodes will not be revisited. However, that may mean you will not have the data you need. Could you please send me DDL scripts for your table structure and the query you would be using with CONNECT BY if that would work? Then I can take a closer look at what you are exactly asking for.

    Lucas

  8. Hi Adamp,

    It does have a similar protection against endless loops – through the construction:

     

    CYCLE name SET is_cycle TO ‘Y’ DEFAULT ‘N’

    When a node is encountered for the second time, it will be included  in the result set, its column value IS_CYCLE (a new column I am introducing with the statement above) is set to Y and the recursion stops then and there – so there will not be a second iteration starting at this node. Note however that any node where a cycle starts is included in the result set twice.

     

    Lucas

  9. A common problem with product structures (or bills-of-material) in product management tools, is a query that will return all parts, past and present.
    By past, I mean “revisions” of the assembly structure.  Each assembly node at each level may have many revisions.  As a practical matter, the most of the data is repeated revision to revision.  This means that CONNECT BY wastes a lot of time revisiting nodes in the tree – actually, most cases it never finishes.  So to handle this problem I have resorted to doing the recursion in logic, keeping track of visited nodes and skipping them if encountered again.
    Are there any feature in 11g that implement a “don’t revisit node” approach?  Or can the new WITH syntax implement this approach?

  10. Alessandro Maioli on

    I was wondering about perfomance of  hierarchical query using  “WITH” and aliasing column vs the same query made using the “CONNECT BY” with high volume of data:
    It will be fast as the use of  “CONNECT BY” on Oracle DB of previous version (11gR1 and below) with the constetual use of :
    ALTER SESSION SET “_OLD_CONNECT_BY_ENABLED” = TRUE;
    I found only measure of WITH clause vs “CONNECT BY” on Oracle 11gR2, but on that version the “CONNECT BY” performance become poor since _OLD_CONNECT_BY_ENABLED cannot be used…

  11. If you want the Oracle CONNECT_BY_ROOT functionality as well, you can use this query (root column):
    with employees (empno, name, mgr, hierlevel, path, root) as
    ( select empno, ename, job, mgr, 1, ename, ename
    from   emp
    where  mgr is null
    union all
    select e.empno, e.ename, e.job
    ,      e.mgr, m.hierlevel + 1
    ,      m.path||’/’||e.ename,  m.root
    from   emp e
    join
    employees m
    on (m.empno = e.mgr)
    )
    select * from   employees

  12. Is there any way to protect against loops like the NOCYCLE clause does for the CONNECT BY syntax, or does this already protect against loops somehow?

  13. “I assume it is equally fast – though it may be a little bit faster (I have not tried it out)” – can you try it out ? Normally you would have an index on MGR, connect by would have used it to good effect. Also, try with walking up and down the tree and not just starting at root level.

  14. Very good article! I was wondering if I could run past you a problem that requires a recursive SQL such as this. Would this be ok?

  15. I assume it is equally fast – though it may be a little bit faster (I have not tried it out) – especially with functions like SYS_CONNECT_BY_PATH and the likes. More importantly: it is functionally richer. And no, it will not work on 10g – it is an 11gR2 feature that is not available in previous releases of the database.

    Lucas

  16. Hi Joaquin, You are quite correct: the cross join syntax is incorrect. Cross Join are cross joins and do not need/work with the ON condition. I have updated the article likewise. Thanks for pointing it out to me.

    Lucas

  17. Hi Mladen: the syntax will grow on you, trust me! I had that same initial difficulty with this syntax. And I remember feeling similar about Analytical Functions. And boy, have I gotten used to them later on! I trust that this recursive subquery clause will in the end prove more intuitive, especially when it comes to extras like sys_connect_path and connect_by_root. However, the good old connect by syntax will be supported for a very long time (just like the (+) legacy outer join notation or the decode statement.

    Lucas

  18. The query fails using “cross join”, I need to replace it with “inner join”:

    SQL> with employees (empno, name, job, mgr, hierlevel) as
    2 ( select empno, ename, job, mgr, 1
    3 from emp
    4 where mgr is null
    5 union all
    6 select e.empno, e.ename, e.job
    7 , e.mgr, m.hierlevel + 1
    8 from emp e
    9 cross join
    10 employees m
    11 on (m.empno = e.mgr)
    12 )
    13 select *
    14 from employees
    15 ;
    on (m.empno = e.mgr)
    *
    ERROR at line 11:
    ORA-00907: missing right parenthesis

    SQL> ed
    Escrito file tmp\8592.buf

    1 with employees (empno, name, job, mgr, hierlevel) as
    2 ( select empno, ename, job, mgr, 1
    3 from emp
    4 where mgr is null
    5 union all
    6 select e.empno, e.ename, e.job
    7 , e.mgr, m.hierlevel + 1
    8 from emp e
    9 inner join
    10 employees m
    11 on (m.empno = e.mgr)
    12 )
    13 select *
    14* from employees
    15 /
    EMPNO NAME JOB MGR HIERLEVEL
    ———- ———- ——— ———- ———-
    7839 KING PRESIDENT 1
    7566 JONES MANAGER 7839 2
    7698 BLAKE MANAGER 7839 2
    7782 CLARK MANAGER 7839 2
    7499 ALLEN SALESMAN 7698 3
    7521 WARD SALESMAN 7698 3
    7654 MARTIN SALESMAN 7698 3
    7788 SCOTT ANALYST 7566 3
    7844 TURNER SALESMAN 7698 3
    7900 JAMES CLERK 7698 3
    7902 FORD ANALYST 7566 3
    7934 MILLER CLERK 7782 3
    7369 SMITH CLERK 7902 4
    7876 ADAMS CLERK 7788 4

    14 filas seleccionadas.