Mastering XML DB – COUNT and SUM

5

Working for years with Oracle’s relational environment, XML DB is a completely different kind of beast. You’re confronted with all the languages and techniques the XML environment has to offer – XML, XML Schema, XSLT, Xpath, DTD, DOM, SAX, etc,etc,etc – plus the ones you know from Oracle’s (object) relational (database) world – SQL, PL/SQL, java, OCI, java, etc.

I am currently working for a customer, who implemented the XML DB functionality in a new system. The total amount of XML data is roundabout 200 Gb. The XML data is split-up in 4 tables, 2 tables containing an XMLType column (XML Schema based, CLOB storage) and 2 XMLType tables (XML Schema based, Object Relational storage).

Some XML documents are more than 200 pages printed data and fairly complex in their structure , sometimes as big as 3 to 4 Mb. So it’s a real challenge to get the data you want, in a decent time period. Even simple questions like “give me a count of…�? or “summerize the following numbers…�? can be tricky if you are dealing with complextype constructions/elements. Usage of COUNT and SUM can be tricky in an XML environment.

The following will give you more insight in Oracle SQL/XPath solutions and the need for a clear understanding of your XML Schema’s and the solutions presented by the Oracle database.

You can use Oracle’s XMLType functions to query and/or extract data. From Oracle’s XML DB developers Guide it states:

<em>You can query XMLType data and extract portions of it using the existsNode(), extract(), or extractValue() functions. These functions use a subset of the W3C XPath recommendation to navigate the document.</em>

<blockquote>
<li>existsNode()</li>
<li>extract()</li>
<li>extractValue()</li>
</blockquote>

<em>The <strong>extract()</strong> function returns the node or nodes that match the XPath expression. Nodes are returned as an instance of XMLType. The results of extract() can be either a document or documentFragment.
</em>

<blockquote><pre>
SELECT extract(object_value,’/PurchaseOrder/Reference’) "Reference"
FROM PURCHASEORDER;
Reference
———————————————————————–
<reference>SBELL-2002100912333601PDT</reference>

1 row selected.
</pre></blockquote>

<em>The <strong>extractValue()</strong> function returns the value of the text node or attribute value that matches the supplied XPath expression. The value is returned as a SQL scalar datatype. This means that the XPath expression passed to extractValue() must uniquely identify a single text node or attribute value within the document.</em>

<blockquote><pre>
SELECT extractValue(object_value,’/PurchaseOrder/Reference’) "Reference"
FROM PURCHASEORDER;
Reference
———————————————————————–
SBELL-2002100912333601PDT

1 row selected.
</pre></blockquote>

<em>The <strong>existsNode</strong> function evaluates whether or not a given document contains a node which matches a W3C XPath expression. The existsNode() function returns true (1) if the document contains the node specified by the XPath expression supplied to the function and false (0) if it does not.</em>

<blockquote><pre>
SELECT COUNT(*)
FROM PURCHASEORDER
WHERE existsNode(object_value,’/PurchaseOrder/Reference’) = 1;
COUNT(*)
———-
132
1 row selected.

SELECT count(*)
FROM PURCHASEORDER
WHERE existsNode(object_value, ‘/PurchaseOrder[Reference="SBELL-2002100912333601PDT"]‘) = 1;
COUNT(*)
———-
1
1 row selected.
</pre></blockquote>

These examples (see <a href="http://oraclesvca2.oracle.com/docs/cd/B14117_01/appdev.101/b10790/xdb03usg.htm#BABBBJBE" target="_blank"> "Example 3-23 Searching XML Content Using the existsNode() Function"</a> ) from the XML DB Developers Guide can be misleading if you focus to much on the <strong>count(*)</strong> expression.

The count is only correct (in the last example) because there is an explicit “where Reference = {value}�? XPath declaration in it. If you would base your count, as in the statement above, in XPath terms on “where there exists a node called reference�?, then the Xpath expression will probably not give you the expected result.

This is because the existsNode only validates the existance off the node – not how many nodes there are of this node in this branch. If it finds a node – it validates the condition as valid – and exits the branch…

This would give you the "expected" result for the count(*), in the example below = 2 records, if the structure would be a structure like:
 
<blockquote><pre>
<purchaseorder>
<reference>SBELL-2002100912333601PDT</reference>
</purchaseorder>
<purchaseorder>
<reference>SBELL-2004100912389972PDT</reference>
</purchaseorder>
</pre></blockquote>

The statement would give you an "unexpected" result for the count(*), in the example below = 1 record, containting repeating elements like a structure as described below:

 
<blockquote><pre>
<purchaseorder>
<reference>SBELL-2004100912333601PDT</reference>
<reference>SBELL-2004100912389972PDT</reference>
</purchaseorder>
</pre></blockquote>

Look at the following XML Schema structure:

<img src="http://technology.amis.nl/wp-content/uploads/images/payment.jpg" alt="Payments" />

Based on this schema we now create a table and insert some rows.

<code>
<pre>

set lines 50
set long 10000
set trimspool on

drop table amis_xml;

create table amis_xml (data xmltype);

insert into amis_xml
values
(xmltype(‘<PAYMENT><PERSON><SURNAME>Marco</SURNAME><NAME>Gralike</NAME><SALARY>2345</SALARY></PERSON></PAYMENT>’));

insert into amis_xml
values
(xmltype(‘<PAYMENT><PERSON><SURNAME>BertJan</SURNAME><NAME>Meinders</NAME><SALARY>3456</SALARY><SALARY>125</SALARY></PERSON></PAYMENT>’));

insert into amis_xml
values
(xmltype(‘<PAYMENT><PERSON><SURNAME>Chris</SURNAME><NAME>Gralike</NAME><SALARY>4567</SALARY></PERSON></PAYMENT>’));

insert into amis_xml
values
(xmltype(‘<PAYMENT><PERSON><SURNAME>Karin</SURNAME><NAME>Kriebisch</NAME></PERSON></PAYMENT>’));

commit;

desc AMIS_XML
Name Null? Type
———————– ——– —————-
DATA XMLTYPE

select t.data from AMIS_XML t;

DATA
————————————————–
<PAYMENT>
<PERSON>
<SURNAME>Marco</SURNAME>
<NAME>Gralike</NAME>
<SALARY>2345</SALARY>
</PERSON>
</PAYMENT>
<PAYMENT>
<PERSON>
<SURNAME>BertJan</SURNAME>
<NAME>Meinders</NAME>
<SALARY>3456</SALARY>
<SALARY>125</SALARY>
</PERSON>
</PAYMENT>
<PAYMENT>
<PERSON>
<SURNAME>Chris</SURNAME>
<NAME>Gralike</NAME>
<SALARY>4567</SALARY>
</PERSON>
</PAYMENT>
<PAYMENT>
<PERSON>
<SURNAME>Karin</SURNAME>
<NAME>Kriebisch</NAME>
</PERSON>
</PAYMENT>

4 rows selected.

select * from amis_xml t
where existsnode(t.data,’/PAYMENT/PERSON/SALARY’)=1;

DATA
————————————————–
<PAYMENT>
<PERSON>
<SURNAME>Marco</SURNAME>
<NAME>Gralike</NAME>
<SALARY>2345</SALARY>
</PERSON>
</PAYMENT>
<PAYMENT>
<PERSON>
<SURNAME>BertJan</SURNAME>
<NAME>Meinders</NAME>
<SALARY>3456</SALARY>
<SALARY>125</SALARY>
</PERSON>
</PAYMENT>
<PAYMENT>
<PERSON>
<SURNAME>Chris</SURNAME>
<NAME>Gralike</NAME>
<SALARY>4567</SALARY>
</PERSON>
</PAYMENT>

3 rows selected.

select count(*) from amis_xml t
where existsnode(t.data,’/PAYMENT/PERSON/SALARY’)=1;

COUNT(*)
———
3

1 row selected.

</pre></code>

Instead off the (maybe) expected 4 rows selected, existsnode only returns 3. The two salary nodes of Mr. Meinders are counted as one.

The solution to this problem is presented by the <strong>XMLSequence</strong> function.

<em>The <strong>XMLSequence()</strong> function makes it possible to take an XMLType containing a fragment and perform SQL operations on it. It generates a collection of XMLType objects from an XMLType containing a fragment. The collection contains one XMLType for each of the root elements in the fragment. This collection of XMLType objects can then be converted into a virtual table using the SQL table() function. Converting the fragment into a virtual table makes it easier to use SQL to process the results of an extract() function that returned multiple nodes.</em>

The <a href="http://oraclesvca2.oracle.com/docs/cd/B14117_01/appdev.101/b10790/xdb03usg.htm#BABFEHBD" target="_blank">examples</a> as shown in the XML DB Developers Guide, show us examples why and when we should use XMLSequence (mark the XXX records selected).

The described structure:

<blockquote><pre>
<purchaseorder>
<reference>SBELL-2004100912333601PDT</reference>
<reference>SBELL-2004100912389972PDT</reference>
</purchaseorder>
</pre></blockquote>

Would give with the statement:

<blockquote><pre>
SELECT extract(object_value,’/PurchaseOrder/Reference’) "Reference"
FROM PURCHASEORDER
;
Reference
———————————————————————–
<reference>SBELL-2004100912333601PDT</reference>
<reference>SBELL-2004100912389972PDT</reference>
.
1 record selected
</pre>
</blockquote>

<blockquote><pre>
SELECT extract(value(v),’/Reference’) "Reference"
FROM PURCHASEORDER t
, TABLE(XMLSequence(extract(object_value, ‘/PurchaseOrder/Reference’))) v
;
Reference
———————————————————————–
<reference>SBELL-2004100912333601PDT</reference>
<reference>SBELL-2004100912389972PDT</reference>
.
2 records selected
</pre>
</blockquote>

Knowing this, we now can re-write our SQL/XPath statements.

<blockquote><pre>
select extract(value(v),’/SALARY’) "Salary"
from amis_xml t
, TABLE(XMLSequence(extract(t.data,’/PAYMENT/PERSON/SALARY’))) v;
.
Salary
————————————————–
<SALARY>2345</SALARY>
<SALARY>3456</SALARY>
<SALARY>125</SALARY>
<SALARY>4567</SALARY>
.
4 rows selected.

select count(*)
from amis_xml t
, TABLE(XMLSequence(extract(t.data,’/PAYMENT/PERSON/SALARY’))) v;
.
COUNT(*)
———
4
.
1 row selected.

select extractvalue(value(v),’/SALARY’) "Salary"
from amis_xml t
, TABLE(XMLSequence(extract(t.data,’/PAYMENT/PERSON/SALARY’))) v;
.
Salary
——
2345
3456
125
4567
.
4 rows selected.

select sum(extractvalue(value(v),’/SALARY’)) "Total"
from amis_xml t
, TABLE(XMLSequence(extract(t.data,’/PAYMENT/PERSON/SALARY’))) v;

Total
———
10493
1 row selected.
</pre>
</blockquote>

The next release of the Oracle database, version 10g version 2, will hopefully present more flexible tools like XQuery and or better XPath possibilities like:

<blockquote><pre>
select extract(t.data,’count(/PAYMENT/PERSON/SALARY)’)
from amis_xml t ;
ERROR:
ORA-31012: Given XPATH expression not supported
</pre>
</blockquote>

Share.

About Author

Marco Gralike, working for AMIS Services BV as a Principal Oracle Database Consultant in the Netherlands, has experience as a DBA since 1994 (Oracle 6). Marco is also eager and skillful in other fields, like Operating System Administration and Application Servers, mainly to find working, performing solutions. Marco has been specializing in Oracle XMLDB, since 2003, focusing on his old love, database administration and performance. He is an Oracle XMLDB enthusiast ever since. He is also a dedicated contributor of the Oracle User Group community, helping people with their steep XMLDB learning curve. To this purpose, Marco also devoted his personal blog site to XMLDB and other Oracle issues. Marco is a member of the OakTable network and an Oracle ACE Director (specialization Oracle XMLDB).

5 Comments

  1. Marco Gralike on

    Sean Dillon on asktom (http://asktom.oracle.com) has always very good examples. I looked it up a little bit over there (right now i am into “storage” performacne problems not really into select, update, deletes etc), but i think i found an artikel that displains how to do this. See artikel http://asktom.oracle.com/pls/ask/f?p=4950:8:13085855774406036280::NO::F4950_P8_DISPLAYID,F4950_P8_CRITERIA:6634140285004
    (i used search for “update xml sean dillon”). A search there on “sean dillon” is a good read anyway.

  2. Marco, how would one insert a new reference element into the xml block below using Oracle’s XML funcs(ie: extract, updateXml). Is it possible to remove a reference element also ? It seems updateXMl w/ NULL just makes the reference element become NULL.

    SBELL-2004100912333601PDT
    SBELL-2004100912389972PDT
  3. Marco, do you know how I could insert a new element into an xml block like:

    SBELL-2004100912333601PDT
    SBELL-2004100912389972PDT

    How would I insert another reference element using Oracle XML DB funcs ? Is it possible also to remove a reference element completely ? deleteXML() doesnt exist in 10g

  4. Hi Marco, can you explain the difference in node in your XSD and node in your data. To me it looks like your data is not being validated by your XSD, or is it?

    CU Harm