Hierarchical queries return data in a tree like structure. The query is performed by walking the tree that is made up of parents and children, each non-root-node linked to a parent node. A common example of a hierarchical query is the one involving Employees who are linked to each other through the Manager reference.
I was recently facing a situation where I wanted to retrieve a number of nodes from a tree-like data structure, based on certain criteria. I wanted to present the resulting nodes again in a tree like fashion. However, if a child node was selected and its parent or other ancestors were not, I could not set up the tree structure to present the selected nodes in. So I have to make sure that along with certain specific nodes, also all their ancestors nodes are returned in order to restore the tree.
In this article, I will demonstrate several ways of creating such a query, one that returns selected nodes and their ancestors, given a specific hierarchical relation between records. The solutions I show make use of Oracle 9i features – the sys_connect_by_path operator and the combination of hierarchical queries and joins in a single query, which was not allowed prior to 9i.
What do we want to achieve?
We start from the classical EMP table. This table has three columns of interest: EMPNO (primary key), ENAME (display label) and MGR (self referencing foreign key). We use the connection between MGR and EMPNO to build the hierarchy. The simplest query to retrieve the EMP “tree” is this one:
select lpad('+',3*(level-1))||ename||'('||empno||')' Employee from emp connect by prior empno = mgr start with mgr is null /
Note how we added the lpad function to create indentation for the non-root nodes in our tree. We use the pseudo-column LEVEL that indicates the level of nesting (or the number of ancestors) for any node in the tree. The result looks like this:
EMPLOYEE ---------------------------------- KING(7839) +JONES(7566) +SCOTT(7788) +ADAMS(7876) +FORD(7902) +SMITH(7369) +TURNER(7844) +MARTIN(7654) +BLAKE(7698) +ALLEN(7499) +WARD(7521) +JAMES(7900) +CLARK(7782) +MILLER(7934) 14 rows selected.
This query could be executed as far back as Oracle 7 (or perhaps even earlier). In Oracle 9i, two knew features were introduced for hierarchical queries: the ability to join with other tables inside an hierarchical query and the operator SYS_CONNECT_BY_PATH. The latter returns for any node in the tree the concattenation of a certain expression for all nodes from the current node through all its ancestors all the way up to the root node. For example:
select lpad('+',3*(level-1))||ename||' ('||dname||') '||sys_connect_by_path(job,'/') Employee from emp , dept where emp.deptno = dept.deptno connect by prior empno = mgr start with mgr is null
Here we request the JOB value for each node and all its ancestors. The Job values are separated by the ‘/’ sign:
EMPLOYEE ------------------------------------------------------------------------------------- KING (ACCOUNTING) /PRESIDENT +JONES (RESEARCH) /PRESIDENT/ANALYST +SCOTT (RESEARCH) /PRESIDENT/ANALYST/ANALYST +ADAMS (RESEARCH) /PRESIDENT/ANALYST/ANALYST/CLERK +FORD (RESEARCH) /PRESIDENT/ANALYST/ANALYST +SMITH (RESEARCH) /PRESIDENT/ANALYST/ANALYST/CLERK +TURNER (SALES) /PRESIDENT/ANALYST/ANALYST/CLERK/SALESMAN +MARTIN (SALES) /PRESIDENT/ANALYST/ANALYST/CLERK/SALESMAN/ACCOUNTNT +BLAKE (SALES) /PRESIDENT/MANAGER +ALLEN (SALES) /PRESIDENT/MANAGER/SALESMAN +WARD (SALES) /PRESIDENT/MANAGER/SALESMAN +JAMES (SALES) /PRESIDENT/MANAGER/CLERK +CLARK (ACCOUNTING) /PRESIDENT/MANAGER +MILLER (ACCOUNTING) /PRESIDENT/MANAGER/CLERK
Now returning to the job at hand. I want to be able to select from the tree-like structure all nodes that satisfy certain conditions, for example the requirement that the SAL is greater than 3000 or the ENAME contains an ‘I’. However, I want to represent the result in a tree, with all manager-ancestors for each selected employee. Clearly simply adding a where clause of where ename like '%I%'
will not do the trick here:
select lpad('+',3*(level-1))||ename||'('||empno||')' Employee from emp where ename like '%I%' connect by prior empno = mgr start with mgr is null /
The result is clearly not what we intended:
EMPLOYEE ---------------------------- KING(7839) +SMITH(7369) +MARTIN(7654) +MILLER(7934)
We lack the managers of SMITH, MARTIN and MILLER – because obviously their names do not include an ‘I’. So what to do instead?
We need a way to include not only the nodes that directly satisfy the search condition, but also their ancestors. Here we can make use of the SYS_CONNECT_BY_PATH function. We can for example ask each selected node for its SYS_CONNECT_BY_PATH for the EMPNO column. This gives us not only the Employees that satisfy the criteria, but also a list of the EMPNO values for their ancestors in the tree:
select ename , empno , sys_connect_by_path(empno,'.')||'.' scbp from emp connect by prior empno = mgr start with mgr is null / ENAME EMPNO SCBP ---------- ---------- -------------------------------- KING 7839 .7839. JONES 7566 .7839.7566. SCOTT 7788 .7839.7566.7788. ADAMS 7876 .7839.7566.7788.7876. FORD 7902 .7839.7566.7902. SMITH 7369 .7839.7566.7902.7369. TURNER 7844 .7839.7566.7902.7369.7844. MARTIN 7654 .7839.7566.7902.7369.7844.7654. BLAKE 7698 .7839.7698. ALLEN 7499 .7839.7698.7499. WARD 7521 .7839.7698.7521. JAMES 7900 .7839.7698.7900. CLARK 7782 .7839.7782. MILLER 7934 .7839.7782.7934.
We can make use of this approach in the following way:
with tree as ( select ename , empno , sys_connect_by_path(empno,'.')||'.' scbp from emp connect by prior empno = mgr start with mgr is null ) select distinct emp.empno , emp.ename , emp.mgr from tree , emp where instr('.'||tree.scbp,emp.empno||'.') > 0 -- select any employee whose empno is part of the path from the selected tree-nodes all the way to the top and tree.ename like '%I%' -- from the entire tree, only select those nodes that satisfy the search requirements /
First we build the tree – with all the nodes – in the inline view ‘tree’. Then we select from the tree only the nodes that satisfy the search condition. Last we join these selected tree-nodes with table EMP using the condition instr('.'||tree.scbp,emp.empno||'.') > 0
. This specifies that if the SYS_CONNECT_BY_PATH on EMPNO for one of the selected tree-nodes includes the primary key EMPNO of a record in EMP, that record should be includes, as it is either a directly selected tree node (with an I in the ENAME) or one of the ancestors of such a node. The result:
EMPNO ENAME MGR ---------- ---------- ---------- 7369 SMITH 7902 7566 JONES 7839 7654 MARTIN 7844 7782 CLARK 7839 7839 KING 7844 TURNER 7369 7902 FORD 7566 7934 MILLER 7782 8 rows selected.
Now we would like to present this search result in a tree-structure. Using this record set, which we know to include all ancestor nodes from the selected nodes to the root, it should be simple to create the tree again:
with tree as ( select ename , empno , sys_connect_by_path(empno,'.')||'.' scbp from emp connect by prior empno = mgr start with mgr is null ) , selected_tree_nodes as ( select distinct emp.empno , emp.ename , emp.mgr from tree , emp where instr('.'||tree.scbp,emp.empno||'.') > 0 -- select any employee whose empno is part of the path from the selected tree-nodes all the way to the top and tree.ename like '%I%' -- from the entire tree, only select those nodes that satisfy the search requirements ) select lpad(ename,level*3+10)||' ('||empno||')' emp_node -- finally build a tree from the subset of nodes that were returned from selected_tree_nodes connect by PRIOR empno = mgr start with mgr is null /
The resulting tree looks like this:
EMP_NODE ----------------------------------- KING (7839) JONES (7566) FORD (7902) SMITH (7369) TURNER (7844) MARTIN (7654) CLARK (7782) MILLER (7934) 8 rows selected.
Alternative approach: Bottom Up Tree
Instead of building the entire tree of employees, starting from all root-nodes and traversing down through all nodes, including nodes and entire sub-branches that do not qualify, is perhaps somewhat overdoing it. Could there not be a more direct approach? What if we start by selecting all nodes that qualify and then build the tree from these nodes? If we need only a few nodes from just a few branches of the tree, would this not be much cheaper? Well, it probably would be. Let’s see how to do this.
select e1.* , e2.empno marker from emp e1 left outer join ( select empno from emp where ename like '%I%' ) e2 -- find all empnos of employees that satisfy the search requirement on (e1.empno = e2.empno) / EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO MARKER ---------- ---------- --------- ---------- --------- ---------- ---------- ---------- ---------- 7369 SMITH CLERK 7902 17-DEC-80 800 20 7369 7654 MARTIN ACCOUNTNT 7844 28-SEP-81 1250 1400 30 7654 7839 KING PRESIDENT 17-NOV-81 5000 10 7839 7934 MILLER CLERK 7782 23-JAN-82 1300 10 7934 7844 TURNER SALESMAN 7369 08-SEP-81 1500 0 30 7782 CLARK MANAGER 7839 09-JUN-81 2450 10 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 7788 SCOTT ANALYST 7566 09-DEC-82 3000 20 7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 7566 JONES ANALYST 7839 02-APR-81 2975 20 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 7902 FORD ANALYST 7566 03-DEC-81 3000 20 7876 ADAMS CLERK 7788 12-JAN-83 1100 20 7900 JAMES CLERK 7698 03-DEC-81 950 30
Here we selected all EMP records, and for each record we have determined whether or not it is one of the selected nodes; this is indicated through the MARKER column. Next we are going to use this set to build a tree, starting from all the nodes that have a value for their marker:
with emps as -- all employees with a marker column for those that satisfy the search requirement (select e1.* , e2.empno marker from emp e1 left outer join ( select empno from emp where ename like '%I%' ) e2 -- find all empnos of employees that satisfy the search requirement on (e1.empno = e2.empno) ) select distinct ename , empno from emps connect by PRIOR mgr=empno -- note that the connect by condition is exactly the reverse of the one we used earlier; it reflects the fact that we build the tree from the bottom upwards, linking records to the MGR reference of the prior node start with marker is not null /
The result looks familiar, as it should:
ENAME EMPNO ---------- ---------- CLARK 7782 FORD 7902 JONES 7566 KING 7839 MARTIN 7654 MILLER 7934 SMITH 7369 TURNER 7844 8 rows selected.
A slightly more compact alternative for this query is the following – it can be used whenever the condition to select the nodes is relatively simple and does not require a subquery:
with emps as -- all employees with a marker column for those that satisfy the search requirement (select e1.* , case when ename like '%I%' then 'X' end marker from emp e1 ) select distinct ename , empno from emps connect by PRIOR mgr=empno start with marker is not null /
Finally, building a tree from the query result - showing all employees whose names contain an 'I' with all their managerial burden, is done like this:
with emps as -- all employees with a marker column for those that satisfy the search requirement (select e1.* , case when ename like '%I%' then 'X' end marker from emp e1 ) , tree_nodes as (select distinct ename , empno , mgr from emps connect by PRIOR mgr=empno start with marker is not null ) select lpad(ename, level * 4) from tree_nodes connect by prior empno = mgr start with mgr is null /
The result, again, is familiar:
LPAD(ENAME,LEVEL*4) ------------------------------ KING CLARK MILLER JONES FORD SMITH TURNER MARTIN 8 rows selected.
Note: if for some reason the WITH clause cannot be used – and it seems that for example Oracle ADF Business Components does not like it – you can rewrite the above queries using plain In Line views:
select distinct ename , empno from ( select e1.* , e2.empno marker from emp e1 left outer join ( select empno from emp where ename like '%I%' ) e2 -- find all empnos of employees that satisfy the search requirement on (e1.empno = e2.empno) ) connect by PRIOR mgr=empno start with marker is not null
and the last one:
select lpad(ename, level * 4) from ( select distinct ename , empno , mgr from ( select e1.* , e2.empno marker from emp e1 left outer join ( select empno from emp where ename like '%I%' ) e2 -- find all empnos of employees that satisfy the search requirement on (e1.empno = e2.empno) ) connect by PRIOR mgr=empno start with marker is not null ) connect by prior empno = mgr start with mgr is null
I tried your initial query on SQL PLus under Oracle 9i, I create a table called emp with some test data.
select lpad(‘+’,3*(level-1))||ename||'(‘||empno||’)’ Employee
from emp
connect
by prior empno = mgr
start
with mgr is null
The result set shows the emp up to the last level, meaning it repeats those employees who have a mgr, while your result set above only do this one time, under the main tree. Any advise?
Donna,
There are no 10g features used in these queries. SYS_CONNECT_BY_PATH and the WITH clause are the two most recent features and both should be in your 9.2 database.
Can you show the SQL that you are trying to execute and that does not work? By the way: if you run the SQL directly in SQL*Plus, does it then fail too?
best regards,
Lucas
We are currently running forms 6.0 (c/s) on an Oracle 9.2. I’m trying to “prune” a tree but can’t get it to work right. Is the above logic only available with 10?
Thanks.