While presenting the 7Up Training – upgrading experienced Oracle developers from their Oracle 7 base of knowledge and experience ("still writing 7 backwards compatible code against a 10g database") to 10g, we ran into a funny little example with the Oracle Analytical SQL Functions – that also applies to standard GROUP BY operations by the way – that suggests that NULL == NULL in some circumstances. To be fair: it feels (intuitively) correct. but theoretically (what do you say Toon??) it is a little fishy I think.
The situation is very simple: I have updated my EMP table to have several employees without department. Then I queried to find out for each employee his name, department, salary and the percentage of the total salary sum within the department allocated to the employee. The query is straightforward:
select ename
, deptno
, sal
, ratio_to_report(sal)
over ( partition by deptno)
sal_percentage
from emp
order
by 2, 4
The data in EMP shows two employees without Department (i.e. their DEPTNO is NULL). The results of this query show the two employees without department – MILLER and KING – as having 20 respectively 80 percent of the total salary sum in their department (the partition of all employees with the same DEPTNO). However, their DEPTNO is NULL and since NULL is never equal to NULL, there are no other employees with the same DEPTNO. Or are there?
ENAME DEPTNO SAL SAL_PERCENTAGE
---------- ---------- ---------- --------------
CLARK 10 2695 1
SMITH 20 800 .071604386
ADAMS 20 1100 .09845603
SCOTT 20 3000 .268516447
FORD 20 3000 .268516447
JONES 20 3272.5 .292906691
JAMES 30 950 .09808983
MARTIN 30 1250 .129065565
WARD 30 1250 .129065565
TURNER 30 1500 .154878678
ALLEN 30 1600 .165203924
BLAKE 30 3135 .323696438
MILLER 1300 .200896307
KING 5171 .799103693
On further analysis – something I apparently never actively realized – the classic GROUP BY does the exact same thing:
select deptno
, count(*)
from emp
group
by deptno
The results also suggest an equality between NULL…
DEPTNO COUNT(*)
-------- ----------
30 6
2 <<<<-------
20 5
10 1
Oracle is just following the rules layed out by ANSI SQL.
I think, though, that you alway should see the result of those queries from the point of view from the end user. And what an end user wants to see is not
DEPTNO COUNT(*)
===== =======
30 6
“null†1
“null†1
“null†1
“null†1
“null†1
“null†1
20 5
10 1
That’s just not convenient. Any query I would write for an end user that yields this result would not be accepted by that user. He would want me to rewrite the query so that all the ‘null’ or ‘unknown’ departments to be put together in one result, because separate records with count=1 have no meaning at all.
So probably Oracle has made the choice to handle nulls like that, and my opinion is that they made the most logical (or call it: intuitive) choice 🙂
Giving each NULL value its own group seems unwieldy, especially if you have an N-column GROUP BY, and then all-but-1, all-but-2, …, or all-but-(N-1) of those columns have NULL values.
And what about:
select null from dual union select null from dual;
It only returns 1 row. Should GROUP BY be any different?
Hmmmm, even wordpress has problems with (or NOT?)
😉
Second try…
Who argues for outcome:
DEPTNO COUNT(*)
===== =======
30 6
“null” 1
“null” 1
20 5
10 1
(hopefully it will now print correctly, otherwise “Lucas Help?!!!”)
Who argues for outcome:
DEPTNO COUNT(*)
——— ———–
30 6
1
From “An introduction to Database Systems” (International Editon – Eighth Edition) by C.J.Date:
Possible meanings / reasons why there is a absence of a value (page 577):
– value unknown
– value not applicable
– value does not exist
– value undefined
– value not supplied
– …
(page 595): Aggregate Operators:
The SQL aggregate operators (SUM,AVG,etc.) do not behave in accordance with the rules for scalar operators explained in Section 19.2, but instead simply ignore any nulls in their argument (except for count(*), where nulls are treated as if they were regular values). Also if the argument to such an operator happens to evaluate to an empty set, COUNT returns zero; the other operators all return NULL. (As noted in Chapter 8, this bahaviour is logically incorrect, but it is the way SQL is defined.)
MG
@Patrick: “…and the employees that are not in any department are summed”, but this is the problem using NULL Patrick, you are assuming that they “are not in any department”. A NULL coudl also imply “we KNOW the belong to a department but it is UNKNOWN to us in which…”
@Jurgen: you don’t want to get into that one… It could be the difference between a salary check with a VALUE on it at the end of the month or a salary check without a VALUE… (you could think of more worse-case real life example of course…you did a 911 call, which is registered in a database, which triggers…).
If think your
select deptno
, count(*)
from emp
group
by deptno
is a great example of Oracle desperatly wanting to give answer where it should not give one.
A department with value NULL is not equal to a department with value NULL. Also a nice example why one should avoid NULL’s
“The results also suggest an equality between NULL…”
From the viewpoint of Oracle’s implementation of NULL’s, it does. Your “count(*)” spoils the soup. See the examples (nice post) of beter nullogy on http://www.databasedesign-resource.com/null-values-in-a-database.html. I think we are dealing with a “this is an error in the COUNT function: Any column would give the same COUNT result…” problem here.
Nice links found regarding nullogy:
*) Fabian Pascal: http://www.dbdebunk.com/page/page/1396241.htm
*) Hugh Darwen: http://web.onetel.com/~hughdarwen/TheThirdManifesto/Missing-info-without-nulls.pdf
*) http://www.dbdebunk.com/about.html
This is rather natural as its defined in ANSI SQL. Similar effect will be with DISTINCT clause, NULL is returned only once.
It feels indeed intuitively correct. But what could theoretically be against having a group of undefined objects?
I don’t think you can say that this function suggests that null = null. What you show here is the count of employees per department. The function sums op the employees that are in dept 30 (6), in dept 20 (5), in dept 10 (1), and the employees that are not in any department are summed: 2 employees apply to that condition.
So in my opinion it’s not a bug. On the contrary, it is a very correct behaviour againts null values in these functions!
NULL is undetermined
Ergo: NULL != NULL