Mastering XML DB - COUNT and SUM startup 594127 1280

Mastering XML DB – COUNT and SUM

Working for years with Oracle’s relational environment, XML DB is a completely different kind of beast. You’re confronted with all the languages and techniques the XML environment has to offer – XML, XML Schema, XSLT, Xpath, DTD, DOM, SAX, etc,etc,etc – plus the ones you know from Oracle’s (object) relational (database) world – SQL, PL/SQL, java, OCI, java, etc.

I am currently working for a customer, who implemented the XML DB functionality in a new system. The total amount of XML data is roundabout 200 Gb. The XML data is split-up in 4 tables, 2 tables containing an XMLType column (XML Schema based, CLOB storage) and 2 XMLType tables (XML Schema based, Object Relational storage).

Some XML documents are more than 200 pages printed data and fairly complex in their structure , sometimes as big as 3 to 4 Mb. So it’s a real challenge to get the data you want, in a decent time period. Even simple questions like “give me a count of…�? or “summerize the following numbers…�? can be tricky if you are dealing with complextype constructions/elements. Usage of COUNT and SUM can be tricky in an XML environment.

The following will give you more insight in Oracle SQL/XPath solutions and the need for a clear understanding of your XML Schema’s and the solutions presented by the Oracle database.

You can use Oracle’s XMLType functions to query and/or extract data. From Oracle’s XML DB developers Guide it states:

<em>You can query XMLType data and extract portions of it using the existsNode(), extract(), or extractValue() functions. These functions use a subset of the W3C XPath recommendation to navigate the document.</em>

<blockquote>
<li>existsNode()</li>
<li>extract()</li>
<li>extractValue()</li>
</blockquote>

<em>The <strong>extract()</strong> function returns the node or nodes that match the XPath expression. Nodes are returned as an instance of XMLType. The results of extract() can be either a document or documentFragment.
</em>

<blockquote><pre>
SELECT extract(object_value,’/PurchaseOrder/Reference’) "Reference"
FROM PURCHASEORDER;
Reference
———————————————————————–
<reference>SBELL-2002100912333601PDT</reference>

1 row selected.
</pre></blockquote>

<em>The <strong>extractValue()</strong> function returns the value of the text node or attribute value that matches the supplied XPath expression. The value is returned as a SQL scalar datatype. This means that the XPath expression passed to extractValue() must uniquely identify a single text node or attribute value within the document.</em>

<blockquote><pre>
SELECT extractValue(object_value,’/PurchaseOrder/Reference’) "Reference"
FROM PURCHASEORDER;
Reference
———————————————————————–
SBELL-2002100912333601PDT

1 row selected.
</pre></blockquote>

<em>The <strong>existsNode</strong> function evaluates whether or not a given document contains a node which matches a W3C XPath expression. The existsNode() function returns true (1) if the document contains the node specified by the XPath expression supplied to the function and false (0) if it does not.</em>

<blockquote><pre>
SELECT COUNT(*)
FROM PURCHASEORDER
WHERE existsNode(object_value,’/PurchaseOrder/Reference’) = 1;
COUNT(*)
———-
132
1 row selected.

SELECT count(*)
FROM PURCHASEORDER
WHERE existsNode(object_value, ‘/PurchaseOrder[Reference="SBELL-2002100912333601PDT"]’) = 1;
COUNT(*)
———-
1
1 row selected.
</pre></blockquote>

These examples (see <a href="http://oraclesvca2.oracle.com/docs/cd/B14117_01/appdev.101/b10790/xdb03usg.htm#BABBBJBE" target="_blank"> "Example 3-23 Searching XML Content Using the existsNode() Function"</a> ) from the XML DB Developers Guide can be misleading if you focus to much on the <strong>count(*)</strong> expression.

The count is only correct (in the last example) because there is an explicit “where Reference = {value}�? XPath declaration in it. If you would base your count, as in the statement above, in XPath terms on “where there exists a node called reference�?, then the Xpath expression will probably not give you the expected result.

This is because the existsNode only validates the existance off the node – not how many nodes there are of this node in this branch. If it finds a node – it validates the condition as valid – and exits the branch…

This would give you the "expected" result for the count(*), in the example below = 2 records, if the structure would be a structure like:
 
<blockquote><pre>
<purchaseorder>
<reference>SBELL-2002100912333601PDT</reference>
</purchaseorder>
<purchaseorder>
<reference>SBELL-2004100912389972PDT</reference>
</purchaseorder>
</pre></blockquote>

The statement would give you an "unexpected" result for the count(*), in the example below = 1 record, containting repeating elements like a structure as described below:

 
<blockquote><pre>
<purchaseorder>
<reference>SBELL-2004100912333601PDT</reference>
<reference>SBELL-2004100912389972PDT</reference>
</purchaseorder>
</pre></blockquote>

Look at the following XML Schema structure:

<img src="https://technology.amis.nl/wp-content/uploads/images/payment.jpg" alt="Payments" />

Based on this schema we now create a table and insert some rows.

<code>
<pre>

set lines 50
set long 10000
set trimspool on

drop table amis_xml;

create table amis_xml (data xmltype);

insert into amis_xml
values
(xmltype(‘<PAYMENT><PERSON><SURNAME>Marco</SURNAME><NAME>Gralike</NAME><SALARY>2345</SALARY></PERSON></PAYMENT>’));

insert into amis_xml
values
(xmltype(‘<PAYMENT><PERSON><SURNAME>BertJan</SURNAME><NAME>Meinders</NAME><SALARY>3456</SALARY><SALARY>125</SALARY></PERSON></PAYMENT>’));

insert into amis_xml
values
(xmltype(‘<PAYMENT><PERSON><SURNAME>Chris</SURNAME><NAME>Gralike</NAME><SALARY>4567</SALARY></PERSON></PAYMENT>’));

insert into amis_xml
values
(xmltype(‘<PAYMENT><PERSON><SURNAME>Karin</SURNAME><NAME>Kriebisch</NAME></PERSON></PAYMENT>’));

commit;

desc AMIS_XML
Name Null? Type
———————– ——– —————-
DATA XMLTYPE

select t.data from AMIS_XML t;

DATA
————————————————–
<PAYMENT>
<PERSON>
<SURNAME>Marco</SURNAME>
<NAME>Gralike</NAME>
<SALARY>2345</SALARY>
</PERSON>
</PAYMENT>
<PAYMENT>
<PERSON>
<SURNAME>BertJan</SURNAME>
<NAME>Meinders</NAME>
<SALARY>3456</SALARY>
<SALARY>125</SALARY>
</PERSON>
</PAYMENT>
<PAYMENT>
<PERSON>
<SURNAME>Chris</SURNAME>
<NAME>Gralike</NAME>
<SALARY>4567</SALARY>
</PERSON>
</PAYMENT>
<PAYMENT>
<PERSON>
<SURNAME>Karin</SURNAME>
<NAME>Kriebisch</NAME>
</PERSON>
</PAYMENT>

4 rows selected.

select * from amis_xml t
where existsnode(t.data,’/PAYMENT/PERSON/SALARY’)=1;

DATA
————————————————–
<PAYMENT>
<PERSON>
<SURNAME>Marco</SURNAME>
<NAME>Gralike</NAME>
<SALARY>2345</SALARY>
</PERSON>
</PAYMENT>
<PAYMENT>
<PERSON>
<SURNAME>BertJan</SURNAME>
<NAME>Meinders</NAME>
<SALARY>3456</SALARY>
<SALARY>125</SALARY>
</PERSON>
</PAYMENT>
<PAYMENT>
<PERSON>
<SURNAME>Chris</SURNAME>
<NAME>Gralike</NAME>
<SALARY>4567</SALARY>
</PERSON>
</PAYMENT>

3 rows selected.

select count(*) from amis_xml t
where existsnode(t.data,’/PAYMENT/PERSON/SALARY’)=1;

COUNT(*)
———
3

1 row selected.

</pre></code>

Instead off the (maybe) expected 4 rows selected, existsnode only returns 3. The two salary nodes of Mr. Meinders are counted as one.

The solution to this problem is presented by the <strong>XMLSequence</strong> function.

<em>The <strong>XMLSequence()</strong> function makes it possible to take an XMLType containing a fragment and perform SQL operations on it. It generates a collection of XMLType objects from an XMLType containing a fragment. The collection contains one XMLType for each of the root elements in the fragment. This collection of XMLType objects can then be converted into a virtual table using the SQL table() function. Converting the fragment into a virtual table makes it easier to use SQL to process the results of an extract() function that returned multiple nodes.</em>

The <a href="http://oraclesvca2.oracle.com/docs/cd/B14117_01/appdev.101/b10790/xdb03usg.htm#BABFEHBD" target="_blank">examples</a> as shown in the XML DB Developers Guide, show us examples why and when we should use XMLSequence (mark the XXX records selected).

The described structure:

<blockquote><pre>
<purchaseorder>
<reference>SBELL-2004100912333601PDT</reference>
<reference>SBELL-2004100912389972PDT</reference>
</purchaseorder>
</pre></blockquote>

Would give with the statement:

<blockquote><pre>
SELECT extract(object_value,’/PurchaseOrder/Reference’) "Reference"
FROM PURCHASEORDER
;
Reference
———————————————————————–
<reference>SBELL-2004100912333601PDT</reference>
<reference>SBELL-2004100912389972PDT</reference>
.
1 record selected
</pre>
</blockquote>

<blockquote><pre>
SELECT extract(value(v),’/Reference’) "Reference"
FROM PURCHASEORDER t
, TABLE(XMLSequence(extract(object_value, ‘/PurchaseOrder/Reference’))) v
;
Reference
———————————————————————–
<reference>SBELL-2004100912333601PDT</reference>
<reference>SBELL-2004100912389972PDT</reference>
.
2 records selected
</pre>
</blockquote>

Knowing this, we now can re-write our SQL/XPath statements.

<blockquote><pre>
select extract(value(v),’/SALARY’) "Salary"
from amis_xml t
, TABLE(XMLSequence(extract(t.data,’/PAYMENT/PERSON/SALARY’))) v;
.
Salary
————————————————–
<SALARY>2345</SALARY>
<SALARY>3456</SALARY>
<SALARY>125</SALARY>
<SALARY>4567</SALARY>
.
4 rows selected.

select count(*)
from amis_xml t
, TABLE(XMLSequence(extract(t.data,’/PAYMENT/PERSON/SALARY’))) v;
.
COUNT(*)
———
4
.
1 row selected.

select extractvalue(value(v),’/SALARY’) "Salary"
from amis_xml t
, TABLE(XMLSequence(extract(t.data,’/PAYMENT/PERSON/SALARY’))) v;
.
Salary
——
2345
3456
125
4567
.
4 rows selected.

select sum(extractvalue(value(v),’/SALARY’)) "Total"
from amis_xml t
, TABLE(XMLSequence(extract(t.data,’/PAYMENT/PERSON/SALARY’))) v;

Total
———
10493
1 row selected.
</pre>
</blockquote>

The next release of the Oracle database, version 10g version 2, will hopefully present more flexible tools like XQuery and or better XPath possibilities like:

<blockquote><pre>
select extract(t.data,’count(/PAYMENT/PERSON/SALARY)’)
from amis_xml t ;
ERROR:
ORA-31012: Given XPATH expression not supported
</pre>
</blockquote>

5 Comments

  1. Marco Gralike November 17, 2005
  2. Foo November 14, 2005
  3. Foo November 14, 2005
  4. Marco May 23, 2005
  5. harm May 21, 2005