Is this a bug in Analytical Functions? NULL is supposed to be UNEQUAL to NULL?

Lucas Jellema October 13, 2006 Database, IT, Oracle 13 Comments

While presenting the 7Up Training – upgrading experienced Oracle developers from their Oracle 7 base of knowledge and experience ("still writing 7 backwards compatible code against a 10g database") to 10g, we ran into a funny little example with the Oracle Analytical SQL Functions – that also applies to standard GROUP BY operations by the way – that suggests that NULL == NULL in some circumstances. To be fair: it feels (intuitively) correct. but theoretically (what do you say Toon??) it is a little fishy I think.

The situation is very simple: I have updated my EMP table to have several employees without department. Then I queried to find out for each employee his name, department, salary and the percentage of the total salary sum within the department allocated to the employee. The query is straightforward:

select ename
,      deptno
,      sal
,      ratio_to_report(sal) 
           over ( partition by deptno) 
       sal_percentage
from   emp
order
by 2, 4

The data in EMP shows two employees without Department (i.e. their DEPTNO is NULL). The results of this query show the two employees without department – MILLER and KING – as having 20 respectively 80 percent of the total salary sum in their department (the partition of all employees with the same DEPTNO). However, their DEPTNO is NULL and since NULL is never equal to NULL, there are no other employees with the same DEPTNO. Or are there?

ENAME          DEPTNO        SAL SAL_PERCENTAGE
---------- ---------- ---------- --------------
CLARK              10       2695              1
SMITH              20        800     .071604386
ADAMS              20       1100      .09845603
SCOTT              20       3000     .268516447
FORD               20       3000     .268516447
JONES              20     3272.5     .292906691
JAMES              30        950      .09808983
MARTIN             30       1250     .129065565
WARD               30       1250     .129065565
TURNER             30       1500     .154878678
ALLEN              30       1600     .165203924
BLAKE              30       3135     .323696438
MILLER                      1300     .200896307
KING                        5171     .799103693

On further analysis – something I apparently never actively realized – the classic GROUP BY does the exact same thing:

 select deptno
 ,      count(*) 
 from   emp 
 group  
 by     deptno

The results also suggest an equality between NULL…

  DEPTNO   COUNT(*)
-------- ----------
      30          6
                  2   <<<<-------
      20          5
      10          1

About The Author

Lucas Jellema

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director and Oracle Developer Champion. Solution architect and developer on diverse areas including SQL, JavaScript, Kubernetes & Docker, Machine Learning, Java, SOA and microservices, events in various shapes and forms and many other things. Author of the Oracle Press book Oracle SOA Suite 12c Handbook. Frequent presenter on user groups and community events and conferences such as JavaOne, Oracle Code, CodeOne, NLJUG JFall and Oracle OpenWorld.

13 Comments

Marco Gralike October 14, 2006
Oracle is just following the rules layed out by ANSI SQL.
Patrick Sinke October 14, 2006
I think, though, that you alway should see the result of those queries from the point of view from the end user. And what an end user wants to see is not
DEPTNO COUNT(*)
===== =======
30 6
â€œnullâ€ 1
â€œnullâ€ 1
â€œnullâ€ 1
â€œnullâ€ 1
â€œnullâ€ 1
â€œnullâ€ 1
20 5
10 1
That’s just not convenient. Any query I would write for an end user that yields this result would not be accepted by that user. He would want me to rewrite the query so that all the ‘null’ or ‘unknown’ departments to be put together in one result, because separate records with count=1 have no meaning at all.
So probably Oracle has made the choice to handle nulls like that, and my opinion is that they made the most logical (or call it: intuitive) choice 🙂
Mr. Ed October 14, 2006
Giving each NULL value its own group seems unwieldy, especially if you have an N-column GROUP BY, and then all-but-1, all-but-2, …, or all-but-(N-1) of those columns have NULL values.
And what about:
select null from dual union select null from dual;
It only returns 1 row. Should GROUP BY be any different?
Marco Gralike October 14, 2006
Hmmmm, even wordpress has problems with (or NOT?)
😉
Second try…
Who argues for outcome:
DEPTNO COUNT(*)
===== =======
30 6
“null” 1
“null” 1
20 5
10 1
(hopefully it will now print correctly, otherwise “Lucas Help?!!!”)
Marco Gralike October 14, 2006
Who argues for outcome:
DEPTNO COUNT(*)
——— ———–
30 6
1
Marco Gralike October 14, 2006
From “An introduction to Database Systems” (International Editon – Eighth Edition) by C.J.Date:
Possible meanings / reasons why there is a absence of a value (page 577):
– value unknown
– value not applicable
– value does not exist
– value undefined
– value not supplied
– …
(page 595): Aggregate Operators:
The SQL aggregate operators (SUM,AVG,etc.) do not behave in accordance with the rules for scalar operators explained in Section 19.2, but instead simply ignore any nulls in their argument (except for count(*), where nulls are treated as if they were regular values). Also if the argument to such an operator happens to evaluate to an empty set, COUNT returns zero; the other operators all return NULL. (As noted in Chapter 8, this bahaviour is logically incorrect, but it is the way SQL is defined.)
MG
Marco Gralike October 14, 2006
@Patrick: “…and the employees that are not in any department are summed”, but this is the problem using NULL Patrick, you are assuming that they “are not in any department”. A NULL coudl also imply “we KNOW the belong to a department but it is UNKNOWN to us in which…”
@Jurgen: you don’t want to get into that one… It could be the difference between a salary check with a VALUE on it at the end of the month or a salary check without a VALUE… (you could think of more worse-case real life example of course…you did a 911 call, which is registered in a database, which triggers…).
Marco Gralike October 14, 2006
If think your
select deptno
, count(*)
from emp
group
by deptno
is a great example of Oracle desperatly wanting to give answer where it should not give one.
A department with value NULL is not equal to a department with value NULL. Also a nice example why one should avoid NULL’s
Marco Gralike October 14, 2006
“The results also suggest an equality between NULLâ€¦”
From the viewpoint of Oracle’s implementation of NULL’s, it does. Your “count(*)” spoils the soup. See the examples (nice post) of beter nullogy on http://www.databasedesign-resource.com/null-values-in-a-database.html. I think we are dealing with a “this is an error in the COUNT function: Any column would give the same COUNT result…” problem here.
Nice links found regarding nullogy:
*) Fabian Pascal: http://www.dbdebunk.com/page/page/1396241.htm
*) Hugh Darwen: http://web.onetel.com/~hughdarwen/TheThirdManifesto/Missing-info-without-nulls.pdf
*) http://www.dbdebunk.com/about.html
PaweÅ‚ Barut October 13, 2006
This is rather natural as its defined in ANSI SQL. Similar effect will be with DISTINCT clause, NULL is returned only once.
Jurgen Kemmelings October 13, 2006
It feels indeed intuitively correct. But what could theoretically be against having a group of undefined objects?
Patrick Sinke October 13, 2006
I don’t think you can say that this function suggests that null = null. What you show here is the count of employees per department. The function sums op the employees that are in dept 30 (6), in dept 20 (5), in dept 10 (1), and the employees that are not in any department are summed: 2 employees apply to that condition.
So in my opinion it’s not a bug. On the contrary, it is a very correct behaviour againts null values in these functions!
Harm Verschuren October 13, 2006
NULL is undetermined
Ergo: NULL != NULL

Share this:

Like this:

Related Posts

About The Author

Lucas Jellema

13 Comments