Extracting Master-Detail data from an XMLType with a single SQL statement

15

Probably this is old news for all of you who have lots of experience with XMLDB, but for me this is something new. In order for me to remember this sort of stuff, I need to write it down (I am getting older, you know) hence this blog. What I needed was a way to extract the data from an XML and insert all the values into two relational tables with a Master-Detail relation between them. The XML was initially stored in a table with a column of XMLType. This is the structure of the XML that needed to be processed:

<SinterklaasWishlists>
   <child>
      <name>Tim</name>
      <wishlist>
         <article>
            <artno>21491269</artno>
            <description>Crane</description>
            <price>12.50</price>
         </article>
         <article>
            <artno>21499517</artno>
            <description>Keyboard</description>
            <price>10</price>
         </article>
         <article>
            <artno>21521591</artno>
            <description>Crime Investigation Game</description>
            <price>9.95</price>
         </article>
      </wishlist>
   </child>
&lt;/SinterklaasWishlists&gt;<span id="more-1389"></span>
The Master data in the XML is the Child for whom this is the Wishlist. The Detail data is the actual wishlist, the toys that the child would like to get from Santa Clause. Let assume for now only one child can be in the XML. And these are the relational tables in which the extracted data is going to be stored:
create table kids
(name varchar2(25) PRIMARY key
)
/
create table wishlist
(kid_name VARCHAR2(25) references kids
,artno number
,description varchar2(50)
,price number (6,2)
)
/

Using XPath expressions it is easy to retrieve the value from a node in the XML. When you want to retrieve the Childs name from the XML file, this is the syntax:

SQL&gt; SELECT extractvalue (wishlist, '*/child/name')
  2    FROM t
  3*  WHERE id = 1
SQL&gt; /

EXTRACTVALUE(WISHLIST,'*/CHILD/NAME')
---------------------------------------------------
Tim

SQL&gt;

Retrieving the detail information, the actual toys, posed more of a challenge. When you want to retrieve the article description with a similar syntax as before you get a exception:

SQL&gt; SELECT extractvalue (wishlist, '*/child/wishlist/article/description')
  2    FROM t
  3  /
SELECT extractvalue (wishlist, '*/child/wishlist/article/description')
*
ERROR at line 1:
ORA-19025: EXTRACTVALUE returns value of only one node

Extractvalue can only retrieve the value of a single node, it reminded me of a TOO-MANY-ROWS exception. And it makes sense, the Wishlist contains multiple articles in a single record so to speak. Which of the articles should be shown when you only have a single record? Should it be the Keyboard, the Crane or the Crime Investigation Game? There are just too many rows to pick from. What is needed is a way to create multiple records from this single XML. Creating a cartesian product would generate lots of rows and this is technique is used in the below example:

SQL&gt; SELECT extractvalue (Value (wl), '*/description')
  2    FROM t
  3       , TABLE (xmlsequence (extract (wishlist, '*/child/wishlist/article'))) wl
  4  /

EXTRACTVALUE(VALUE(WL),'*/DESCRIPTION')
-----------------------------------------------------------------------------------
Crane
Keyboard
Crime Investigation Game

XMLSequence returns a VARRAY of XMLType. With the TABLE function, you can query a VARRAY as if it was a regular table. How this TABLE function works is beyond the scope of this Blog. Here is the final query to extract all the values from the XMLType .

SQL&gt; SELECT extractvalue (wishlist, '*/child/name') childname
  2       , extractvalue (Value (wl), '*/artno') Article_number
  3       , extractvalue (Value (wl), '*/description') description
  4       , extractvalue (Value (wl), '*/price') price
  5    FROM t
  6       , TABLE (xmlsequence (extract (wishlist, '*/child/wishlist/article'))) wl
  7   where t.id = 1
  8  /

CHILDNAME       ARTICLE_NUMBER       DESCRIPTION                         PRICE
--------------- -------------------- ----------------------------------- -----------
Tim             21491269             Crane                               12.50
Tim             21499517             Keyboard                            10
Tim             21521591             Crime Investigation Game            9.95

Multiple Kids

And just when you think you are done, things change. What happens when there are multiple wishlists in a single XML? The query that we just created will fail, with the same exception that we saw earlier: “ORA-19025: EXTRACTVALUE returns value of only one node”. The thing to do is use another XMLSequence to retrieve the wishlist for each of the kids:

SQL&gt; SELECT extractvalue (value (kids), '*/name') childname
  2       , extractvalue (value (wl), '*/artno') Article_number
  3       , extractvalue (value (wl), '*/description') description
  4       , extractvalue (value (wl), '*/price') price
  5    FROM t
  6       , TABLE (xmlsequence (extract (wishlist, '*/child'))) kids
  7       , TABLE (xmlsequence (extract (value (kids), '*/wishlist/article'))) wl
  8   WHERE id = 2
  9  /

CHILDNAME       ARTICLE_NUMBER  DESCRIPTION                         PRICE
--------------- --------------- ----------------------------------- -------------
Tim             21491269        Crane                               12.50
Tim             21499517        Keyboard                            10
Tim             21521591        Crime Investigation Game            9.95
Lara            21539350        Chalk Board                         12
Lara            21271168        Winnie the Pooh Bingo               10
Lara            21517846        Washing Machine                     10

Multi Table Insert

Since all elements are falling in place, the last step was to insert all the values into the relational tables. This could be done using a Multi Table Insert, just like:

SQL&gt; INSERT ALL
  2    WHEN rn = 1
  3    THEN INTO kids
  4    VALUES (childname)
  5    WHEN article_number IS NOT NULL
  6    THEN INTO wishlist
  7    VALUES (childname,article_number, description, price)
  8  SELECT Row_Number() over (PARTITION BY childname
  9                                ORDER BY null) rn
 10       , childname
 11       , article_number
 12       , description
 13       , price
 14    from (select extractvalue (value (kids), '*/name') childname
 15               , extractvalue (value (wl), '*/artno') Article_number
 16               , extractvalue (value (wl), '*/description') description
 17               , extractvalue (value (wl), '*/price') price
 18            FROM t
 19               , TABLE (xmlsequence (extract (wishlist, '*/child'))) kids
 20               , TABLE (xmlsequence (extract (value (kids), '*/wishlist/article'))) wl
 21           WHERE id = 2
 22         )
 23  /

8 rows created.

And there you have it, going from a single XML to filling two tables with a Master-Detail relation (sometimes called a Parent-Child relation, where the Parent table in this example is the KIDS table.) in a single SQL statement.

Using XMLTABLE syntax

As I am using a Oracle 10r2 database on my laptop, there is also a different kind of syntax you can use.

“Older” Syntax
SELECT extractvalue (wishlist, '*/child/name') childname
     , extractvalue (Value (wl), '*/artno') Article_number
     , extractvalue (Value (wl), '*/description') description
     , extractvalue (Value (wl), '*/price') price
  FROM t
     , TABLE (xmlsequence (extract (wishlist, '*/child/wishlist/article'))) wl
WHERE id = 1
SELECT extractvalue (value (kids), '*/name') childname
     , extractvalue (value (wl), '*/artno') Article_number
     , extractvalue (value (wl), '*/description') description
     , extractvalue (value (wl), '*/price') price
  FROM t
     , TABLE (xmlsequence (extract (wishlist, '*/child'))) kids
     , TABLE (xmlsequence (extract (value (kids), '*/wishlist/article'))) wl
 WHERE id = 2
“Newer” Syntax
SELECT extractvalue (wishlist, '*/child/name') childname
     , wl.*
  FROM t
     , xmltable ('*/child/wishlist/article'
       passing t.wishlist
       columns articlenumber number path 'artno'
             , description varchar2(25) path 'description'
             , price varchar2(10) path 'price'
       ) wl
WHERE id = 1
SELECT kids.childname
     , wl.*
  FROM t
     , xmltable ('*/child'
       passing t.wishlist
       columns childname varchar2(25) path 'name'
             , articles xmltype path 'wishlist'
       ) kids
     , xmltable ('*/article'
       passing kids.articles
       columns articlenumber number path 'artno'
             , description varchar2(25) path 'description'
             , price varchar2(10) path 'price'
       ) wl
 WHERE id = 2

Sinterklaas

In the Netherlands, where AMIS is located, we celebrate Sinterklaas. As the legend goes, he lives in Spain and arives in the Netherlands by Steamboat. December fifth is his birthday and on that day children get presents. Our Sinterklaas is probably derived from the same legend as Santa Clause. The wishlists in this blog belong to my children. They are eagerly anticipating the arrival of Sinterklaas. More on Sinterklaas: http://www-astro.physics.ox.ac.uk/~erik/sint/sint.html

Update: How could I forget? Like I said before.. I am getting older ;-) Here is the Demo file I used in this blog.

For Steve (from the comments below)

Because it is not possible to properly format code in the comments section, I put it up here. What you want to do when a kid doesn’t have a wishlist list (indeed hard to imagine) is do an outerjoin. Yes, it is a simple as that…
First lets add an extra kid to our XML who doesn’t have a wishlist:

&lt;child&gt;
   &lt;name&gt;Alex&lt;/name&gt;
&lt;/child&gt;

Then add the outerjoin to the query:

SELECT extractvalue (value (kids), '*/name') childname
     , extractvalue (value (wl), '*/artno') Article_number
     , extractvalue (value (wl), '*/description') description
     , extractvalue (value (wl), '*/price') price
  FROM t
     , TABLE (xmlsequence (extract (wishlist, '*/child'))) kids
     , TABLE (xmlsequence (extract (value (kids), '*/wishlist/article')))<strong>(+)</strong> wl
 WHERE id = 2

And that’s all there is to it…

 

For Val (from the comments below)

You can refer to attributes using an xpath-expression. To learn more about xpath expressions, I highly recommend www.w3schools.com.
First I’ll create another entry in the T-table with a thing in the article tag which contains the attributes we’re interested in. Using the @-sign, you can refer to attribute values.

INSERT INTO t
VALUES (3, xmltype (
'&lt;sinterklaaswishlists&gt;
   &lt;child&gt;
      &lt;name&gt;Tim&lt;/name&gt;
      &lt;wishlist&gt;
         &lt;article&gt;
         &lt;thing price="12.50" description="Crane" artno="21491269"&gt;
            first
         &lt;/thing&gt;
         &lt;/article&gt;
         &lt;article&gt;
           &lt;thing price="10" description="Keyboard" artno="21499517"&gt;
            second
         &lt;/thing&gt;
         &lt;/article&gt;
         &lt;article&gt;
            &lt;thing price="9.95" description="Crime Investigation Game" artno="21521591"&gt;
            third
         &lt;/thing&gt;
         &lt;/article&gt;
      &lt;/wishlist&gt;
   &lt;/child&gt;
&lt;/sinterklaaswishlists&gt;
'))
/
COMMIT
/
SELECT extractvalue (value (kids), '*/name') childname
     , extractvalue (value (wl), '*/thing/@artno') Article_number
     , extractvalue (value (wl), '*/thing/@description') description
     , extractvalue (value (wl), '*/thing/@price') price
  FROM t
     , TABLE (xmlsequence (extract (wishlist, '*/child'))) kids
     , TABLE (xmlsequence (extract (value (kids), '*/wishlist/article'))) wl
 WHERE id = 3
/
CHILDNAME                 ARTICLE_NUMBER            DESCRIPTION               PRICE
------------------------- ------------------------- ------------------------- -----
Tim                       21491269                  Crane                     12.50
Tim                       21499517                  Keyboard                  10
Tim                       21521591                  Crime Investigation Game  9.95

Hope this helps.

Share.

About Author

15 Comments

  1. Have a look at the following XMLTABLE with FOR ORIDINALITY example, where in this case the “GrossDividentRate” represents a repeating group of elements in the XML document and is passed as an XML fragment into the next XMLTABLE function:

    select xt2.gross_div_rate_cur_code   as "CURRENCY_CODE"
    ,      xt2.gross_div_rate_cur_amount as "CURRENCY_AMOUNT"
    ,      xt2.position                  as "ELEMENT_POSITION"
    from   my_xmltable_store  cmt
    ,      xmltable(xmlnamespaces(default 'http://www.website.nl/ns/1.0'),
                    '/ApplicationMessage'
                    passing cmt.object_value
                      GrossDividendRate         XMLTYPE      path 'CorporateActionDetails'
                    ) xt1
    ,      xmltable(xmlnamespaces(default 'http://www.website.nl/ns/1.0'),
                    '/CorporateActionDetails/GrossDividendRate/*'
                    passing xt1.GrossDividendRate
                    columns
                      POSITION FOR ORDINALITY,
                      gross_div_rate_cur_code    number(5)   path 'CurrencyCode',
                      gross_div_rate_cur_amount  varchar2(8) path 'Amount'
                    ) xt2
    ;
    

    See for more info and howto’s the Oracle OTN XMLDB forum and or my dedicated XMLDB blog, besides my posts on XML DB on the technology.amis.nl blog.

    HTH

    Marco

     

  2. Some good advice to all who read this post.

    Be aware that the propriety XSQL functions are getting deprecated from 11gR2 and onwards.

    The table(xmlsequence(extract())) syntax (XPath V1) should only be used in Oracle 1ogR1 and below.

    The optimized XMLTABLE function (XPath V2 and XQuery 1.0 compatible) should be used from Oracle 10gR2 and onwards. This functionality is (performance wise) optimized in 10gR2 and onwards while the older propriety functionality is maintained only.  XMLTABLE makes use of the XQuery engine introduced in Oracle 1ogR2.

    To answer KILE’s question. Have a look at the FOR ORDINALITY syntax in XMLTABLE to sort out repeating groups/elements. In  principal you can pass on the repeating group to a SECOND XMLTABLE function as a XML Fragment by choosing XMLTYPE as its datatype. In the second XMLTABLE function you refer in the PASSING clause to defined XMLTYPE column in the XMLTABLE function.

    You could do the same (passing on XML fragments) with the table(xmlsequence(extract())) construct via the “position” XPath function as an alternative for the FOR ORDINALITY, but I wouldn’t advice this to be used in Oracle 10gR2. To make this more interesting. Officially, despite that it works, the “position” xpath function is not supported by Oracle in 10g and below…

    …AND ALWAYS, ALWAYS try to use the full XPATH.

    Avoid, but better, NEVER USE WILDCARD’s like “*” or “//”, if you favor performance. By giving the XML parser (or in some cases the Oracle Cost based Optimizer) only such hints to work with, the whole XML document will be searched. In effect this is very very very inefficient regarding memory and CPU use… You have been warned by someone who nowadays knows…

    HTH

    Marco
     

  3. Hi,
    I’m using the following select to get the values of the items1 field:
    SELECT EXTRACT (VALUE (xmltable0), ‘//value/text()’).getstringval () AS nombre_completo
    FROM TABLE i, TABLE (XMLSEQUENCE (EXTRACT (i.xml_informe, ‘//main/items2/value’))) xmltable0
    Where a record from this table is:
    <main>
    <items1>
    <value>3</value>
    <value>5</value>
    <value>6</value>
    <value>7</value>
    <value>8</value>
    </items1>
    <items2>
    <value>3</value>
    <value>5</value>
    <value>
    <value>1-7</value>
    <value>1-8</value>
    </value>
    <value>7</value>
    <value>8</value>
    </items2>
    <main>
    So I get 5 rows with values: 3 5 6 7 8.
    My question is that sometimes I get more than 1 level of hierarchy, like in items2. How could be a good way to get it? I’m a little lost about how to make it, so any help will be more than welcome.
     
    Thank u very much in advance
     
     

  4. Arthur Johnson on

    Resolved my question with SQL:
    SELECT extractvalue(Value(a1), ‘*/ProductReference’) productreference,
    extractvalue(Value(a1), ‘*/SerialNumber’) serialnumber,
    extractvalue(Value(a1), ‘*/Model’) “model”,
    extractvalue(Value(a1), ‘*/Coverage’) coverage,
    extractvalue(Value(a1), ‘*/CoverageStatus’) coveragestatus,
    extractvalue(Value(s1), ‘*/CountryCode’) countrycode,
    extractvalue(Value(s1), ‘*/CountryName’) countryname,
    extractvalue(Value(s1), ‘*/CustomerName’) customername,
    extractvalue(Value(s1), ‘*/CustomerNumber’) customernumber,
    extractvalue(Value(s1), ‘*/SiteNumber’) sitenumber,
    extractvalue(Value(s1), ‘*/SiteShortName’) siteshortname,
    extractvalue(Value(s1), ‘*/Address1′) address1,
    extractvalue(Value(s1), ‘*/Address2′) address2,
    extractvalue(Value(s1), ‘*/City’) city,
    extractvalue(Value(s1), ‘*/State’) state,
    extractvalue(Value(s1), ‘*/PostalCode’) postalcode,
    extractvalue(Value(s1), ‘*/Location’) sitelocation,
    extractvalue(Value(s1), ‘*/SitePhoneNumber’) sitephonenumber,
    extractvalue(Value(s1), ‘*/ContactName’) contactname,
    extractvalue(Value(s1), ‘*/Contactphone’) contactphone,
    extractvalue(Value(s1), ‘*/CSRCode’) csrcode,
    extractvalue(Value(s1), ‘*/CSRName’) csrname,
    extractvalue(Value(s1), ‘*/BranchCode’) branchcode,
    extractvalue(Value(s1), ‘*/TerritoryCode’) territorycode,
    sdi.sdiid as reqid
    FROM aradmin.sdi_xml_tab sdi,
    TABLE (xmlsequence (extract (sdidocxml, ‘*/Assets/Asset’))) a1,
    TABLE (xmlsequence (extract (value(a1), ‘*/Site’))) s1

    Thanks,
    Arthur

  5. Arthur Johnson on

    I have a similar structured XML where I would like to display each Asset’s tags and the Site tags for each Asset. when I run the query below, I get error- ORA-00904: “ASSET”: invalid identifier. What’s wrong?

    XML:
    ===

    +






















    .
    .
    .

    200
    84

    QUERY:
    =====
    SELECT extractvalue(Value(a1), ‘*/ProductReference’) productreference,
    extractvalue(Value(a1), ‘*/SerialNumber’) serialnumber,
    extractvalue(Value(a1), ‘*/Model’) “model”,
    extractvalue(Value(a1), ‘*/Coverage’) coverage,
    extractvalue(Value(a1), ‘*/CoverageStatus’) coveragestatus,
    extractvalue(Value(s1), ‘*/CountryCode’) countrycode,
    extractvalue(Value(s1), ‘*/CountryName’) countryname,
    extractvalue(Value(s1), ‘*/CustomerName’) customername,
    extractvalue(Value(s1), ‘*/CustomerNumber’) customernumber,
    extractvalue(Value(s1), ‘*/SiteNumber’) sitenumber,
    extractvalue(Value(s1), ‘*/SiteShortName’) siteshortname,
    extractvalue(Value(s1), ‘*/Address1′) address1,
    extractvalue(Value(s1), ‘*/Address2′) address2,
    extractvalue(Value(s1), ‘*/City’) city,
    extractvalue(Value(s1), ‘*/State’) state,
    extractvalue(Value(s1), ‘*/PostalCode’) postalcode,
    extractvalue(Value(s1), ‘*/Location’) sitelocation,
    extractvalue(Value(s1), ‘*/SitePhoneNumber’) sitephonenumber,
    extractvalue(Value(s1), ‘*/ContactName’) contactname,
    extractvalue(Value(s1), ‘*/Contactphone’) contactphone,
    extractvalue(Value(s1), ‘*/CSRCode’) csrcode,
    extractvalue(Value(s1), ‘*/CSRName’) csrname,
    extractvalue(Value(s1), ‘*/BranchCode’) branchcode,
    extractvalue(Value(s1), ‘*/TerritoryCode’) territorycode,
    sdi.sdiid as reqid
    FROM aradmin.sdi_xml_tab sdi,
    TABLE (xmlsequence (extract (Asset, ‘*/Assets’))) a1,
    TABLE (xmlsequence (extract (value(a1), ‘*/Asset/Site’))) s1

  6. Alex Nuijten on

    Thank you, Val for your comments. You can use xpath expressions to refer to attributes.
    I’ll put up an example in the body of the post (formatting issue in the comments section)

  7. Thank you for writing this. I do have a question – what if my XML file has only three tags and all the values are in attributes? What’s the syntax of selecting attribute values?

  8. Alex Nuijten on

    Thank you for your comments Steve. I put an answer to your question in the posting (it is not possible to properly format code in this section). Now that I reread your question I think I interpreted your question wrong. Maybe you don’t want an outerjoin after all. If you don’t want to see the kids without wishlist, but only the kids who do have a wishlist, the queries as the were posted originally do that.

  9. I’ve been looking at a similar query, but always wondered how to do an inner join if the kids had no wishlist (hard to believe!), i.e. I don’t want to see aa row at all if there is no wishlist for the child.

  10. Alex Nuijten on

    Thank you, Jaromir. Good point. This is something I probably should have explained more or better (if at all) in this blog.

  11. Hi,

    > How you got the cartesian product is hard to determine..

    not for my as I made the same error:)

    the wrong way is to do:

    5 FROM t
    6 , TABLE (xmlsequence (extract (wishlist, ‘*/child’))) kids
    7 , TABLE (xmlsequence (extract (wishlist, ‘*/wishlist/article’))) wl
    8 WHERE id = 2

    Note the usage of whishlist instead of value (kids) in the second TABLE() line

    regards,

    Jaromir D.B. Nemec

  12. Alex Nuijten on

    Hi Alan,

    Thank you for your comment.

    Table T is the table that looks like this:
    create table t
    (id number
    ,wishlist xmltype
    )
    In this table there are two records, the first one (with ID 1) has a single child’s wishlist. The second one (with ID 2) contains the wishlist of
    two children.
    The demo-script, included with the blog, contains both the table definition and the contents of this table.

    How you got the cartesian product is hard to determine as you didn’t include the query that you used.

    When I create another entry in the T-table (with ID 3) like:
    INSERT INTO T
    VALUES (3, xmltype (‘
    Tim
    21491269
    Crane
    12.50
    21499517
    Keyboard
    10
    21521591
    Crime Investigation Game
    9.95
    Lara
    21539350
    Chalk Board
    12
    21517846
    Washing Machine
    10
    21271168
    Winnie the Pooh Bingo
    10
    Rachel
    21539341
    Barbie Doll
    11.50
    ‘));
    commit;
    And issue this query:
    SELECT Row_Number() over (PARTITION BY childname
    ORDER BY null) rn
    , childname
    , article_number
    , description
    , price
    from (
    select extractvalue (value (kids), ‘*/name’) childname
    , extractvalue (value (wl), ‘*/artno’) Article_number
    , extractvalue (value (wl), ‘*/description’) description
    , extractvalue (value (wl), ‘*/price’) price
    FROM t
    , TABLE (xmlsequence (extract (wishlist, ‘*/child’))) kids
    , TABLE (xmlsequence (extract (value (kids), ‘*/wishlist/article’))) wl
    WHERE id = 3
    );
    This is the result:
    1 1 Lara 21539350 Chalk Board 12
    2 2 Lara 21517846 Washing Machine 10
    3 3 Lara 21271168 Winnie the Pooh Bingo 10
    4 1 Rachel 21539341 Barbie Doll 11.50
    5 1 Tim 21491269 Crane 12.50
    6 2 Tim 21499517 Keyboard 10
    7 3 Tim 21521591 Crime Investigation Game 9.95
    Hope this answers your questions. And for those of you curious about how got what? Tim got the keyboard and Lara the chalkboard. ;-)

  13. I tried your suggested solutions but had a few problems.

    I presume the table with an XMLtype column was called t.

    Where is the column id defined?

    With multiple names and multiple Article_number, description and price my result set produced

    CHILDNAME ARTICLE_NUMBER DESCRIPTION PRICE
    Tim 21491269 Crane 12.5
    Tim 21499517 Keyboard 10
    Tim 21521591 Crime Investigation Game 9.95
    Tim 21491255 doll 52.5
    Tim 21499517 Keyboard 10
    Tim 21521599 Kitchen 29.95
    Rachel 21491269 Crane 12.5
    Rachel 21499517 Keyboard 10
    Rachel 21521591 Crime Investigation Game 9.95
    Rachel 21491255 doll 52.5
    Rachel 21499517 Keyboard 10
    Rachel 21521599 Kitchen 29.95

    i.e. a cartesian join between children and Article_number.

    Any suggestions?