Is this a bug in Analytical Functions? NULL is supposed to be UNEQUAL to NULL?

13

While presenting the 7Up Training – upgrading experienced Oracle developers from their Oracle 7 base of knowledge and experience ("still writing 7 backwards compatible code against a 10g database") to 10g, we ran into a funny little example with the Oracle Analytical SQL Functions – that also applies to standard GROUP BY operations by the way – that suggests that NULL == NULL in some circumstances. To be fair: it feels (intuitively) correct. but theoretically (what do you say Toon??) it is a little fishy I think.

....
 

The situation is very simple: I have updated my EMP table to have several employees without department. Then I queried to find out for each employee his name, department, salary and the percentage of the total salary sum within the department allocated to the employee. The query is straightforward:

select ename<br />,      deptno<br />,      sal<br />,      ratio_to_report(sal) <br />           over ( partition by deptno) <br />       sal_percentage<br />from   emp<br />order<br />by 2, 4 <br />

The data in EMP shows two employees without Department (i.e. their DEPTNO is NULL). The results of this query show the two employees without department – MILLER and KING – as having 20 respectively 80 percent of the total salary sum in their department (the partition of all employees with the same DEPTNO). However, their DEPTNO is NULL and since NULL is never equal to NULL, there are no other employees with the same DEPTNO. Or are there?

ENAME          DEPTNO        SAL SAL_PERCENTAGE<br />---------- ---------- ---------- --------------<br />CLARK              10       2695              1<br />SMITH              20        800     .071604386<br />ADAMS              20       1100      .09845603<br />SCOTT              20       3000     .268516447<br />FORD               20       3000     .268516447<br />JONES              20     3272.5     .292906691<br />JAMES              30        950      .09808983<br />MARTIN             30       1250     .129065565<br />WARD               30       1250     .129065565<br />TURNER             30       1500     .154878678<br />ALLEN              30       1600     .165203924<br />BLAKE              30       3135     .323696438<br />MILLER                      1300     .200896307<br />KING                        5171     .799103693 <br />

On further analysis – something I apparently never actively realized – the classic GROUP BY does the exact same thing:

 select deptno<br /> ,      count(*) <br /> from   emp <br /> group  <br /> by     deptno<br />

The results also suggest an equality between NULL…

<p>  DEPTNO   COUNT(*)<br />-------- ----------<br />      30          6<br />                  2   &lt;&lt;&lt;&lt;-------<br />      20          5<br />      10          1<br />&nbsp;</p>
Share.

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director for Fusion Middleware. Consultant, trainer and instructor on diverse areas including Oracle Database (SQL & PLSQL), Service Oriented Architecture, BPM, ADF, Java in various shapes and forms and many other things. Author of the Oracle Press book: Oracle SOA Suite 11g Handbook. Frequent presenter on conferences such as JavaOne, Oracle OpenWorld, ODTUG Kaleidoscope, Devoxx and OBUG. Presenter for Oracle University Celebrity specials.

13 Comments

  1. I think, though, that you alway should see the result of those queries from the point of view from the end user. And what an end user wants to see is not

    DEPTNO COUNT(*)
    ===== =======
    30 6
    “null” 1
    “null” 1
    “null” 1
    “null” 1
    “null” 1
    “null” 1

    20 5
    10 1

    That’s just not convenient. Any query I would write for an end user that yields this result would not be accepted by that user. He would want me to rewrite the query so that all the ‘null’ or ‘unknown’ departments to be put together in one result, because separate records with count=1 have no meaning at all.
    So probably Oracle has made the choice to handle nulls like that, and my opinion is that they made the most logical (or call it: intuitive) choice :)

  2. Giving each NULL value its own group seems unwieldy, especially if you have an N-column GROUP BY, and then all-but-1, all-but-2, …, or all-but-(N-1) of those columns have NULL values.

    And what about:

    select null from dual union select null from dual;

    It only returns 1 row. Should GROUP BY be any different?

  3. Hmmmm, even wordpress has problems with (or NOT?)

    ;-)

    Second try…

    Who argues for outcome:

    DEPTNO COUNT(*)
    ===== =======
    30 6
    “null” 1
    “null” 1
    20 5
    10 1

    (hopefully it will now print correctly, otherwise “Lucas Help?!!!”)

  4. From “An introduction to Database Systems” (International Editon – Eighth Edition) by C.J.Date:

    Possible meanings / reasons why there is a absence of a value (page 577):

    – value unknown
    – value not applicable
    – value does not exist
    – value undefined
    – value not supplied
    – …

    (page 595): Aggregate Operators:

    The SQL aggregate operators (SUM,AVG,etc.) do not behave in accordance with the rules for scalar operators explained in Section 19.2, but instead simply ignore any nulls in their argument (except for count(*), where nulls are treated as if they were regular values). Also if the argument to such an operator happens to evaluate to an empty set, COUNT returns zero; the other operators all return NULL. (As noted in Chapter 8, this bahaviour is logically incorrect, but it is the way SQL is defined.)

    MG

  5. @Patrick: “…and the employees that are not in any department are summed”, but this is the problem using NULL Patrick, you are assuming that they “are not in any department”. A NULL coudl also imply “we KNOW the belong to a department but it is UNKNOWN to us in which…”

    @Jurgen: you don’t want to get into that one… It could be the difference between a salary check with a VALUE on it at the end of the month or a salary check without a VALUE… (you could think of more worse-case real life example of course…you did a 911 call, which is registered in a database, which triggers…).

  6. If think your

    select deptno
    , count(*)
    from emp
    group
    by deptno

    is a great example of Oracle desperatly wanting to give answer where it should not give one.

    A department with value NULL is not equal to a department with value NULL. Also a nice example why one should avoid NULL’s

  7. “The results also suggest an equality between NULL…”

    From the viewpoint of Oracle’s implementation of NULL’s, it does. Your “count(*)” spoils the soup. See the examples (nice post) of beter nullogy on http://www.databasedesign-resource.com/null-values-in-a-database.html. I think we are dealing with a “this is an error in the COUNT function: Any column would give the same COUNT result…” problem here.

    Nice links found regarding nullogy:

    *) Fabian Pascal: http://www.dbdebunk.com/page/page/1396241.htm
    *) Hugh Darwen: http://web.onetel.com/~hughdarwen/TheThirdManifesto/Missing-info-without-nulls.pdf
    *) http://www.dbdebunk.com/about.html

  8. Jurgen Kemmelings on

    It feels indeed intuitively correct. But what could theoretically be against having a group of undefined objects?

  9. I don’t think you can say that this function suggests that null = null. What you show here is the count of employees per department. The function sums op the employees that are in dept 30 (6), in dept 20 (5), in dept 10 (1), and the employees that are not in any department are summed: 2 employees apply to that condition.
    So in my opinion it’s not a bug. On the contrary, it is a very correct behaviour againts null values in these functions!