Using the Oracle XMLDB Repository to Automatically Shred Windows Office Documents (Part 1) startup 594127 1280

Using the Oracle XMLDB Repository to Automatically Shred Windows Office Documents (Part 1)

People who have attended the UKOUG presentation this year where Mark Drake, Sr. Product Manager XML Technologies / XMLDB, Oracle HQ, and I demonstrated the first principles of the XDB Repository, might have been impressed with its (GEO/KML Spatial, Image EXIF info) capabilities combined with Google Earth. This post will zoom in on how to consume automatically content of Windows Office document (docx).

Most (APEX) people know the PL/SQL Gateway functionality of the XDB Protocol Listener, but this is only one very small part of the XDB Repository functionality. To be precise only one “servlet” part of it. Those “servlets” can be based on Java, C or PL/SQL. The PL/SQL Gateway, as it’s name suggests, is based on the PL/SQL part. Another “servlet”, the Native Database Web Service (NDWS), which enables you to create a database SOA endpoint service and more, is based on C code. Beside demonstrating the WebDAV ACL driven security features and database extensibility/interfacing facilities based on the database (no cost option) XMLDB functionality, it also explain one of the coolest features, IMHO, introduced in Oracle 11gR1 called: XDB Repository Events.

There is one big problem in all of this. It is very, very sparsely documented and although there is some, or was some, code out there on the worldwide web, it took me a while to get a feeling of all the specifics at hand. This post will extend on some of the posts I already created to give to a head start of what is possible. This post will demonstrate what you can do with Windows Office documents, nowadays embedded (zipped) XML content, with extentions called: docx, xlsx or pptx…

The code here is an alternative, rewritten in PL/SQL (because we can…) out there on the internet, which is based on Java called “Oracle OpenWorld 2010 Oracle XML DB Repository Demo: MS Office Integration” (Oct. 2010 – Thomas Baby, Principal Developer, Oracle HQ). It demonstrates the use of the XDB Repository, XDB Repository Events and methods that can be used to query, via XML, the content of some Microsoft Word example docx documents. To make this “simple” example work in PL/SQL we need, once again:

  1. An Event Listener
  2. An Event Handler
  3. A Resource Configuration file

An event listener, is a PL/SQL package or Java Class, that handles the XDB “events”. An Event handler is the, in PL/SQL, procedure that handles a single event (e.g. “pre create”, “post create”, “pre delete”, etc). A Resource configuration file is an XML file, that acts as the glue between the XDB Repository file and folders (called “resources” in WebDAV terminology), so for example it determines which actions happen based on which XDB Repository Events, on which pre conditions (if defined, e.g. only for “jpg” files or which specific folder), under which (security) conditions, etc. As stated during the presentation mentioned, your imagination is the limit on what you would like to do.

Besides needing the XDB Repository Event strategy setup, we need for this example also:

  1. A folder structure
  2. Anton‘s Zip/Unzip package: AS_ZIP (with some small alterations)
  3. The “Event Handler” PL/SQL package, called: HANDLE_XDB_EVENTS

Setting up the Environment

For ease, let’s create a simple user account called EVENTS and create via this account the used XDB Repository folder structure.

SQL> create user events identified by events account unlock;
SQL> grant connect, resource, xdbadmin, xdb_set_invoker to events;

Granted roles, CONNECT and RESOURCE will be used, among others, to be able to connect to the database, to create the needed PL/SQL code and table(s). The XDBADMIN role is needed if we would like to alter the xdbconfig.xml XDB Protocol listener configuration file that controls its behavior. The XDB_SET_INVOKER role will not be explicitly be used in this example, but could be used to further tighten security regarding call of code, via the resource configurartion file, of the defined event listener.

So now we have created the EVENTS user account, lets connect to the database with it and create the XDB Repository Folder/Resource structures. The following will create a /xdb main root folder, with underneath defined folders “upload“, “download” and a “res” folder where we will create our resource configuration XML file (and secure it if needed against public access).

SQL> connect events/events

DECLARE
   --
   TYPE v_ntt IS TABLE OF VARCHAR2(1024);
   v_res v_ntt := v_ntt( '/xdb/'
                       , '/xdb/upload/'
                       , '/xdb/res/'
                       , '/xdb/download/'
                       );
   --
   v_boolean BOOLEAN;
   v_exist   EXCEPTION;
   --
   PRAGMA EXCEPTION_INIT(v_exist, -31003);
   --
BEGIN
   --
   FOR i IN v_res.FIRST .. v_res.LAST
   LOOP
   --
      BEGIN
         v_boolean := DBMS_XDB.CREATEFOLDER(v_res(i));
      EXCEPTION
         WHEN v_exist THEN NULL;
      END;
   --
   END LOOP;
   --
   COMMIT;
   --
END;
/

Be aware that whatever you do now via the EVENTS database account, the default WebDAV ACL security will assign EVENTS as the owner. I mention this explicitly because sometimes you will be confused why, for example, a DBA role privileged user might not be able to see certain XDB Repository resources. This is because the security implementation of XMLDB functionality in the database will be enforced via Oracle’s Virtual Database functionality (DBMS_RLS), which might prevent you to see or access things, based on database USER, ROLE, or WebDAV access. It is possible to create some structures which can not be seen or dropped afterwards due to your own implemented restrictions (or enforced via the environment).

As you might know, its very easy to create files and folders in the XDB Repository via for example PL/SQL packages DBMS_XDB, DBMS_RESOURCE or via FTP or WebDAV access (enabled in all Microsoft Office documents or via Windows Explorer). Those “resources”, file and folder, can then be accessed also via HTTP(s) if needed. The WebDAV access to the database can also be opened up via using the opensource DAVFS code or via a piece of software in Windows that can be used to mount the database, via WebDAV, directly as a file share, if needed.

So…now we have those directory structures and we can query these structures from within the database.

SQL> SELECT path(1) as "Relative Path"
   2 FROM RESOURCE_VIEW
   3 WHERE under_path(RES, 1, '/xdb', 1) = 1;

Relative Path
----------------
download
res
upload

3 rows selected.

SQL> SELECT any_path as "Absolute Path"
   2 FROM RESOURCE_VIEW
   3 WHERE under_path(RES, '/xdb') = 1;

Absolute Path
----------------
/xdb/download
/xdb/res
/xdb/upload

3 rows selected.

Beside methods like using XDBURITYPE, you are also access the XDB Repository via one of the by Oracle defined views, PATH_VIEW or RESOURCE_VIEW, directly via SQL or PL/SQL.

XDB Repository Events

Let’s say we want all files that are be created in the folder /xdb/upload or be handled via a Event Listener package. This Event Listener, for now called HANDLE_XDB_EVENTS, will do specific stuff for us with those resources created in /xdb/upload like picking up the content it and inserting the needed content for further use automatically in a table. We have to instruct the XDB Repository Event framework to act on those specific needs by defining them in a resource configuration file. This configuration file will be created in the /xdb/res directory folder.

DECLARE
  b BOOLEAN := FALSE;
BEGIN
 b := DBMS_XDB.createResource(
 '/xdb/res/UploadHandling.xml',
 '<resconfig xmlns="<a href="http://xmlns.oracle.com/xdb/XDBResConfig.xsd">http://xmlns.oracle.com/xdb/XDBResConfig.xsd"
xmlns:xsi="<a href="http://www.w3.org/2001/XMLSchema-instance">http://www.w3.org/2001/XMLSchema-instance</a>"
xsi:schemaLocation="<a href="http://xmlns.oracle.com/xdb/XDBResConfig.xsd">http://xmlns.oracle.com/xdb/XDBResConfig.xsd</a>
<a href="http://xmlns.oracle.com/xdb/XDBResConfig.xsd">http://xmlns.oracle.com/xdb/XDBResConfig.xsd</a>">
   <event -listeners>
     <listener>
       <description>OOXML Application</description>
       <schema>EVENTS</schema>
       <source>HANDLE_XDB_EVENTS</source>
       <language>PL/SQL</language>
       <events>
         <post -LinkIn/>
         <post -Update/>
         <render />
       </events>
     </listener>
   </event>
   <defaultchildconfig>
     <configuration>
       <path>/xdb/res/UploadHandling.xml</path>
     </configuration>
   </defaultchildconfig>
 </resconfig>
 ');
END;
/

The package used here DBMS_XDB creates a resource (file) and needs as input the absolute path plus the content that will be in this file. So in this case our file will be created in /xdb/res and will have as filename: UploadHandling.xml (/xdb/res/UploadHandling.xml). The needed content of this resource configuration XML file defines in this example:

  • The event listener name in use: HANDLE_XDB_EVENTS PL/SQL Package
  • The package owner: EVENTS
  • The source flavor: PL/SQL
  • A description
  • The events that will trigger the package to be executed: Post-LinkedIn, Post-Update, Render
  • All resources, childs, will be associated with this resource file automatically via being defined in the “defaultChildConfig” element section

Be aware of the references, in the XML content of the resource configuration file, to the XDBResConfig.xsd XML Schema. The content MUST follow the outlines defined in this XML schema. Just like a lot of other XMLDB configuration and structures, this and other important XML Schema’s are already registered in the XMLDB environment. No extra effort is needed. You read about its contents in the appendix of the XMLDB Developers Guide for your database version or can access it directly via a standard method like using the XDBURIFILE function.

In my situation I already created a “dummy” package with the name HANDLE_XDB_EVENTS. I come back later on the specifics regarding content needed…

The next step is to associate this resource configuration to the file or folder were we need it. In this example we want the HANDLE_XDB_EVENT package also ONLY to be triggered/called via XDB Repository Events when files or folders are created in the /xdb/upload folder. No where else. This is achieved by associating the resource configuration file to the actual folder via DBMS_RESCONFIG.

BEGIN
  DBMS_RESCONFIG.appendResConfig(
    '/xdb/upload',
    '/xdb/res/UploadHandling.xml', DBMS_RESCONFIG.APPEND_RESOURCE);
END;
/

We “append” our event listener configuration needs, because there are already event listeners in place (in Oracle 11gR2 XE at least) that handle extended security methods.

SQL> SELECT any_path as "Absolute Path"
   2 FROM RESOURCE_VIEW
   3 WHERE under_path(RES, 1, '/xdb/res', 1) = 1;

Absolute Path
----------------
/xdb/res/UploadHandling.xml

1 row selected.

SQL> select xdburitype('/xdb/res/UploadHandling.xml').getCLOB() as "ResConfig" from dual;

ResConfig
-----------------------------------------------------------------
<resconfig xmlns="<a href="http://xmlns.oracle.com/xdb/XDBResConfig.xsd">http://xmlns.oracle.com/xdb/XDBResConfig.xsd" xmlns:xsi="<a href="http://www.w3.org/2001/XMLSchema-instance">http://www.w3.org/2001/XMLSchema-instance</a>" xsi:schemaLocation="<a href="http://xmlns.oracle.com/xdb/XDBResConfig.xsd">http://xmlns.oracle.com/xdb/XDBResConfig.xsd</a>  <a href="http://xmlns.oracle.com/xdb/XDBResConfig.xsd">http://xmlns.oracle.com/xdb/XDBResConfig.xsd</a>" enable="true">
  <event -listeners set-invoker="false">
    <listener>
      <description>OOXML Application</description>
      <schema>EVENTS</schema>
      <source>HANDLE_XDB_EVENTS</source>
      <language>PL/SQL</language>
      <events>
        <post -LinkIn/>
        <post -Update/>
        <render />
      </events>
    </listener>
  </event>
  <defaultchildconfig>
    <configuration />
  </defaultchildconfig>
</resconfig>

To get some info on which Event Handlers are defined on a resource you can use the GETLISTENER function of package DBMS_RESCONFIG. This will output the result in XMLTYPE which is very convenient because that enables use to feed the result directly into functions like XMLTABLE so we can query and format it via XPATH to get the needed info.

SQL> select xt.schema
  2       , xt.source
  3       , xt.description
  4  FROM   XMLTABLE (xmlnamespaces(default 'http://xmlns.oracle.com/xdb/XDBResConfig.xsd')
  5                   , '/event-listeners/listener'
  6                   PASSING (DBMS_RESCONFIG.GETLISTENERS('/xdb/upload'))
  7                   COLUMNS
  8                     schema      VARCHAR2(30) PATH 'schema'
  9                   , source      VARCHAR2(30) PATH 'source'
 10                   , description VARCHAR2(100) PATH 'description'
 11                 ) xt
 12  WHERE xt.schema = user;

SCHEMA  SOURCE            DESCRIPTION
------- ----------------- -----------------
EVENTS	HANDLE_XDB_EVENTS OOXML Application

1 row selected.

or even a bit more complex…

SQL> select xt.schema
  2       , xt.eventlistener
  3       , xt.language
  4       , xe.roworder
  5       , xe.event
  6  FROM   XMLTABLE (xmlnamespaces(default 'http://xmlns.oracle.com/xdb/XDBResConfig.xsd')
  7                  , '/event-listeners/listener'
  8                  PASSING (DBMS_RESCONFIG.GETLISTENERS('/xdb/upload'))
  9                  COLUMNS
 10                    schema        VARCHAR2(30) PATH 'schema'
 11                  , eventlistener VARCHAR2(30) PATH 'source'
 12                  , language      VARCHAR2(10) PATH 'language'
 13                  , childevents        XMLTYPE PATH 'events'
 14                  ) xt
 15     ,   XMLTABLE (xmlnamespaces(default 'http://xmlns.oracle.com/xdb/XDBResConfig.xsd')
 16                  , '/events/*'
 17                  PASSING xt.childevents
 18                  COLUMNS
 19                     roworder FOR ORDINALITY
 20                   , event VARCHAR2(25) PATH 'fn:local-name(.)'
 21                  ) xe
 22  WHERE xt.schema = user;

SCHEMA  EVENTLISTENER     LANGUAGE ROWNUMBER EVENT
------- ----------------- -------- --------- -----------
EVENTS	HANDLE_XDB_EVENTS    PL/SQL        1 Post-LinkIn
EVENTS	HANDLE_XDB_EVENTS    PL/SQL        2 Post-Update
EVENTS	HANDLE_XDB_EVENTS    PL/SQL        3 Render

3 rows selected.

So now we created the folder structure, a dummy package that will act as the event listener with the name HANDLE_XDB_EVENTS, a user account that owns the structures, and associated a resource configuration XML file that enforces now the execution of the HANDLE_XDB_EVENTS when events “Post-LinkedIn, Post-Update, Render” are detected by the XDB Event framework.

Accessing Microsoft Word Content

To be able to do everything in PL/SQL, I needed a ZIP/UNZIP method. So I asked my colleague Anton Scheffer, if he knew how to do this. Anton was busy at the time with his own PDF PL/SQL generator package, and this small section (zip/unzip) was one of the things he needed himself, so guess who the “colleague” is mentioned in Anton’s post: “Parsing a Microsoft Word docx, and unzip zipfiles, with PL/SQL“. Anton’s example takes a Windows Word/docx document from the fileserver as input, but we (Mark Drake and I) do our database work, as long as possible, in the database, so in our case we need this package (AS_ZIP) to work with the XDB Repository. Mark overloaded the ZIP and UNZIP procedures with pipelined table functionality, with the needed extra TYPE creation.

create or replace type ZIP_CONTENTS_LIST
    as TABLE of VARCHAR2(4000)
/
create or replace type ZIP_ENTRY force as OBJECT
(
  FILENAME VARCHAR2(4000),
  CONTENT  BLOB
)
/
create or replace type ZIP_ENTRY_TABLE as table of ZIP_ENTRY
/
--
CREATE OR REPLACE package as_zip
is
/*
** Written by Anton Scheffer
** 3 April 2011
*/
--
  function get_file_list(
    p_dir in varchar2
  , p_zip_file in varchar2
  , p_encoding in varchar2 := null
  )
    return zip_contents_list;
--
  function get_file_list(
    p_zipped_blob in blob
  , p_encoding in varchar2 := null /* Use CP850 for zip files created with a German Winzip to see umlauts, etc */
  )
    return zip_contents_list;
--
  function get_file(
    p_dir in varchar2
  , p_zip_file in varchar2
  , p_file_name in varchar2
  , p_encoding in varchar2 := null
  )
    return blob;
--
  function get_file(
    p_zipped_blob in blob
  , p_file_name in varchar2
  , p_encoding in varchar2 := null
  )
    return blob;
--
  function unzip(
    p_zipped_blob in blob
  , p_encoding in varchar2 := null
  )
    return zip_entry_table pipelined;
--
  function unzip(
    p_dir in varchar2
  , p_zip_file in varchar2
  , p_encoding in varchar2 := null
  )
    return zip_entry_table pipelined;
--
  procedure add1file(
    p_zipped_blob in out blob
  , p_name in varchar2
  , p_content in blob
  );
--
  procedure finish_zip(
    p_zipped_blob in out blob
  );
--
  procedure save_zip(
    p_zipped_blob in blob
  , p_dir in varchar2 := 'MY_DIR'
  , p_filename in varchar2 := 'my.zip'
  );

So now we can use Anton’s unzip package and get the needed info returned via a pipelined table function, defined on the user type definition ZIP_ENTRY which will return the FILENAME and the CONTENT.

user type definition
------------------------------
type ZIP_ENTRY force as OBJECT
(
  FILENAME VARCHAR2(4000),
  CONTENT  BLOB
)

Knowing that an XMLTYPE datatype can handle a lot of different datatypes (BFILE, BLOB, AnyData, CLOB, Ref Cursor, etc) , the rest is fairly “easy”. The following will give us the filename (VARCHAR2(4000)) and the content (BLOB) of the file in the XDB Repository.

select filename, content
from table(as_zip.unzip((xdburitype('/xdb/download/BM123Manual.docx').getBLOB())));

To make “content”, currently a BLOB, an XMLTYPE datatype, we have to feed it also NLS_CHARACTERSET info. This can be done via, for example:

select filename
     , xmltype(content,NLS_CHARSET_ID('AL32UTF8')) as DOCX
from table(as_zip.unzip((xdburitype('/xdb/download/BM123Manual.docx').getBLOB())))
where filename = 'word/document.xml';

So now we have as a result, a filename, an XMLType based content with as the source the document.xml file which is located in the (zipped) Word document in the XDB Repository called BM123Manual.docx which resides in the directory folder /xdb/download. If you drag and drop, for example via WebDAV and Windows Explorer, this example Word document from the original OTN Java based source in this repository folder and execute the query via SQL*Plus or SQL*Developer, then the result is:

select filename
     , xmltype(content,NLS_CHARSET_ID('AL32UTF8')) as DOCX
from table(as_zip.unzip((xdburitype('/xdb/download/BM123Manual.docx').getBLOB())))
where filename = 'word/document.xml';

FILENAME
----------------
word/document.xml

DOCX
-------------------------------------------------------------------------------------
< ?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:ve="<a href="http://schemas.openxmlformats.org/markup-compatibility/2006">http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="<a href="http://schemas.openxmlformats.org/officeDocument/2006/relationships">http://schemas.openxmlformats.org/officeDocument/2006/relationships</a>" xmlns:m="<a href="http://schemas.openxmlformats.org/officeDocument/2006/math">http://schemas.openxmlformats.org/officeDocument/2006/math</a>" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp="<a href="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing">http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing</a>" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="<a href="http://schemas.openxmlformats.org/wordprocessingml/2006/main">http://schemas.openxmlformats.org/wordprocessingml/2006/main</a>" xmlns:wne="<a href="http://schemas.microsoft.com/office/word/2006/wordml%22%3E%3Cw:body%3E%3Cw:sdt%3E%3Cw:sdtPr%3E%3Cw:alias">http://schemas.microsoft.com/office/word/2006/wordml"><w:body><w:sdt><w:sdtpr><w:alias </a> w:val="Title"/><w:tag w:val="Title"/><w:id w:val="9892861"/><w:placeholder><w:docpart w:val="DefaultPlaceholder_22675703"/></w:placeholder><w:text /></w:alias></w:sdtpr><w:sdtcontent><w:p w:rsidR="002E72A7" w:rsidRDefault="009B3F53" w:rsidP="00961E62"><w:ppr><w:jc w:val="center"/></w:ppr><w:r><w:rpr><w:b /><w:sz w:val="40"/></w:rpr><w:t>Operating Manual for BM123</w:t></w:r></w:p></w:sdtcontent></w:sdt><w:p w:rsidR="00324D04" w:rsidRDefault="00324D04" w:rsidP="00961E62"><w:ppr><w:jc w:val="center"/></w:ppr><w:r><w:t xml:space="preserve">By </w:t></w:r><w:sdt><w:sdtpr><w:alias w:val="Author"/><w:tag w:val="Author"/><w:id w:val="9274235"/><w:placeholder><w:docpart w:val="DefaultPlaceholder_22675703"/></w:placeholder><w:text /></w:sdtpr><w:sdtcontent><w:r w:rsidR="009B3F53"><w:t>Bob, Tom</w:t></w:r></w:sdtcontent></w:sdt></w:p><w:sdt><w:sdtpr><w:alias w:val="Category"/><w:tag w:val="Category"/><w:id w:val="9892860"/><w:placeholder><w:docpart w:val="DefaultPlaceholder_22675703"/></w:placeholder><w:text /></w:sdtpr><w:sdtcontent><w:p w:rsidR="00121F2C" w:rsidRDefault="009B3F53" w:rsidP="00961E62"><w:ppr><w:jc w:val="center"/></w:ppr><w:r><w:t>Technical Spec</w:t></w:r></w:p></w:sdtcontent></w:sdt><w:p w:rsidR="00121F2C" w:rsidRDefault="00121F2C"/><w:p w:rsidR="00121F2C" w:rsidRDefault="00B275F7"><w:sdt><w:sdtpr><w:alias w:val="Abstract"/><w:tag w:val="Abstract"/><w:id w:val="9892868"/><w:placeholder><w:docpart w:val="DefaultPlaceholder_22675703"/></w:placeholder><w:text /></w:sdtpr><w:sdtcontent><w:r w:rsidR="00121F2C"><w:t>On the Insert tab, the galleries include items that are designed to coordinate with the overall look of your document. You can use these galleries to insert tables, headers, footers, lists, cover pages, and other document building blocks. When you create pictures, charts, or diagrams, they also coordinate with your current document look</w:t></w:r></w:sdtcontent></w:sdt></w:p><w:p w:rsidR="00121F2C" w:rsidRDefault="00121F2C"><w:r><w:t>You can easily change the formatting of selected text in the document text by choosing a look for the selected text from the Quick Styles gallery on the Home tab. You can also format text directly by using the other controls on the Home tab. Most controls offer a choice of using the look from the current theme or using a format that you specify directly.</w:t></w:r></w:p><w:p w:rsidR="00121F2C" w:rsidRDefault="00121F2C"><w:r><w:t>To change the overall look of your document, choose new Theme elements on the Page Layout tab. To change the looks available in the Quick Style gallery, use the Change Current Quick Style Set command. Both the Themes gallery and the Quick Styles gallery provide reset commands so that you can always restore the look of your document to the original contained in your current template.</w:t></w:r></w:p><w:p w:rsidR="00121F2C" w:rsidRDefault="00121F2C"/><w:sectpr w:rsidR="00121F2C" w:rsidSect="002E72A7"><w:pgsz w:w="12240" w:h="15840"/><w:pgmar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/><w:cols w:space="720"/><w:docgrid w:linePitch="360"/></w:sectpr></w:body></a></w:document>

Not that very easy to read, useful, yet…but it becomes a bit more interesting if you query it (the CONTENT) via XQuery / XPath. The following gives you already a bit more (although concatinated still) info:

select extract((xmltype(content,NLS_CHARSET_ID('AL32UTF8')))
              , '//w:t/text()'
              , 'xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"'
              ) as "docx"
from table(as_zip.unzip((xdburitype('/xdb/download/BM123Manual.docx').getBLOB())))
where filename = 'word/document.xml';

DOCX
-------------------------------------------------------------------------------------
Operating Manual for BM123By Bob, TomTechnical SpecOn the Insert tab, the galleries include items that are designed to coordinate with the overall look of your document. You can use these galleries to insert tables, headers, footers, lists, cover pages, and other document building blocks. When you create pictures, charts, or diagrams, they also coordinate with your current document lookYou can easily change the formatting of selected text in the document text by choosing a look for the selected text from the Quick Styles gallery on the Home tab. You can also format text directly by using the other controls on the Home tab. Most controls offer a choice of using the look from the current theme or using a format that you specify directly.To change the overall look of your document, choose new Theme elements on the Page Layout tab. To change the looks available in the Quick Style gallery, use the Change Current Quick Style Set command. Both the Themes gallery and the Quick Styles gallery provide reset commands so that you can always restore the look of your document to the original contained in your current template.

This content could, for example, already be used for a Oracle Text Index search, combined with Oracle XMLIndex indexes to enable a quicker search on the XML element and attributes (in this case the “w:t” section).

The usefulness becomes even more evident, I think, when you do stuff like

--
-- PRINT ALL AVAILABLE TAGS
--

SELECT DISTINCT(BOOK_INFO.TAG)
FROM (select xmltype(content,NLS_CHARSET_ID('AL32UTF8')) as OBJECT_VALUE
      from table(as_zip.unzip((xdburitype('/xdb/download/BM123Manual.docx').getBLOB())))
      -- where filename = 'word/document.xml'
     ) BOOKS
   , XMLTABLE(XMLNAMESPACES(
              'http://schemas.openxmlformats.org/wordprocessingml/2006/main' AS "w"), '/w:document/w:body//w:sdt'
        PASSING BOOKS.OBJECT_VALUE
        COLUMNS TAG VARCHAR2(100) PATH '/w:sdt/w:sdtPr/w:tag/@w:val'
) BOOK_INFO;

TAG
--------------
Abstract
Title
Author
Category

4 rows selected.
--
-- FIND TITLE OF ALL BOOKS IN THE "Technical Spec" CATEGORY
--

VAR      CATEGORY VARCHAR2(20);
EXECUTE :CATEGORY := 'Technical Spec';

SELECT XMLQUERY('declare namespace w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"; /w:document/w:body//w:sdt[w:sdtPr/w:tag/@w:val="Title"]/w:sdtContent//w:r//w:t//text()'
                 PASSING BOOKS.OBJECT_VALUE RETURNING CONTENT) TITLE
FROM (select xmltype(content,NLS_CHARSET_ID('AL32UTF8')) as OBJECT_VALUE
      from table(as_zip.unzip((xdburitype('/xdb/download/BM123Manual.docx').getBLOB())))
      -- where filename = 'word/document.xml'
     ) BOOKS
WHERE  XMLEXISTS('declare namespace w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"; /w:document/w:body//w:sdt[w:sdtPr/w:tag/@w:val="Category" and w:sdtContent//w:r//w:t//text()=$CATEGORY]'
                PASSING BOOKS.OBJECT_VALUE,
                        :CATEGORY AS "CATEGORY");
/

TITLE
---------------------------
Operating Manual for BM123

1 row selected.
--
-- RETURN BOOK TITLE, AUTHORS AND CATEGORY IN XML FORMAT
--

SELECT XMLQUERY('declare namespace w="<a href="http://schemas.openxmlformats.org/wordprocessingml/2006/main">http://schemas.openxmlformats.org/wordprocessingml/2006/main</a>"; <result>
                  <title>{$BOOK//w:document/w:body//w:sdt[w:sdtPr/w:tag/@w:val="Title"]/w:sdtContent//w:r//w:t//text()}</title>
                  <authors>{$BOOK//w:document/w:body//w:sdt[w:sdtPr/w:tag/@w:val="Author"]/w:sdtContent//w:r//w:t//text()}</authors>
                  <category>{$BOOK//w:document/w:body//w:sdt[w:sdtPr/w:tag/@w:val="Category"]/w:sdtContent//w:r//w:t//text()}</category>
               </result>'
               PASSING BOOKS.OBJECT_VALUE AS "BOOK" RETURNING CONTENT) RESULT
FROM (select xmltype(content,NLS_CHARSET_ID('AL32UTF8')) as OBJECT_VALUE
      from table(as_zip.unzip((xdburitype('/xdb/download/BM123Manual.docx').getBLOB())))
      where filename = 'word/document.xml'
     ) BOOKS
;

RESULT
------------------------
<result><title>Operating Manual for BM123</title><authors>Bob, Tom</authors><category>Technical Spec</category></result>

Missing Links

So where we? We have the XDB Repository set-up. We know now how to extract information from a Windows Office document. We enabled the XDB Repository Event framework for folder /xdb/upload. We now need to specify the actions needed in package (Event Listener) HANDLE_XDB_EVENTS.

The events specified in the Resource Configuration file in /xdb/res/UploadHandling.xml, Post-LinkIn, Post-Update, Render, have to be handled by Event Handlers, that is procedures defined in the package, handling the specified events. The procedure naming is very specific and Case Sensitive. It took me a while to figure out those procedure names, until I saw the following Java document describing the “XDBRepositoryEventListener“. Your procedure names have to match these names. When you do, in the most extended form, you will get something like the following for the events defined in 11gR2 XE environment (derived from the XDBResConfig.xml content): “Intro into XDB Repository Events (OTN XMLDB Forum)“.

After creating a DOCSTORE table like the following and index the XML content via…

CREATE TABLE DOCSTORE
( filename     VARCHAR2(4000)
, zipfilename VARCHAR2(4000)
, xmlcontent XMLTYPE)
TABLESPACE USERS
XMLTYPE xmlcontent STORE AS SECUREFILE BINARY XML
/
CREATE INDEX DOCSTORE_XMLINDEX ON DOCSTORE(xmlcontent)
INDEXTYPE IS XDB.XMLINDEX
PARAMETERS ('PATHS( INCLUDE(  /w:document/w:body//w:sdt //w:tbl  )
             NAMESPACE MAPPING
             (xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"))'
           )
/

You can now create/define the appropriate procedures in the package that handle the defined events. So, while I kept for demonstration reasons all the procedures in tact in the package, the package body contains a procedure called “handlePostLinkIn” (one of the triggered events) that does nothing else to call the main procedure “handleMain”.

PROCEDURE handlePostLinkIn(
    P_EVENT dbms_xevent.XDBRepositoryEvent)
AS
BEGIN
  --
  handleMain(P_EVENT);
  --
END handlePostLinkIn;

This “handleMain” procedure in the package body, executes the needed actions, when, via the DBMS_XEVENT package, Oracle recognizes that the “Post-LinkedIn” event has been triggered. The handleMain procedure has a case statement that does just that…

...
...
  -- ------------------------------------------
  -- Associate fired event with action
  -- ------------------------------------------
  CASE V_EVENT_TYPE
  WHEN dbms_xevent.RENDER_EVENT THEN
    V_EVENT_NAME := 'Render';
  WHEN dbms_xevent.PRE_CREATE_EVENT THEN
    V_EVENT_NAME := 'PreCreate';
  WHEN dbms_xevent.POST_CREATE_EVENT THEN
    V_EVENT_NAME := 'PostCreate';
  WHEN dbms_xevent.PRE_DELETE_EVENT THEN
    V_EVENT_NAME := 'PreDelete';
  WHEN dbms_xevent.POST_DELETE_EVENT THEN
    V_EVENT_NAME := 'PostDelete';
  WHEN dbms_xevent.PRE_UPDATE_EVENT THEN
    V_EVENT_NAME := 'PreUpdate';
  WHEN dbms_xevent.POST_UPDATE_EVENT THEN
    V_EVENT_NAME := 'PostUpdate';
  WHEN dbms_xevent.PRE_LOCK_EVENT THEN
    V_EVENT_NAME := 'PreLock';
  WHEN dbms_xevent.POST_LOCK_EVENT THEN
    V_EVENT_NAME := 'PostLock';
  WHEN dbms_xevent.PRE_UNLOCK_EVENT THEN
    V_EVENT_NAME := 'PreUnlock';
  WHEN dbms_xevent.POST_UNLOCK_EVENT THEN
    V_EVENT_NAME := 'PostUnlock';
  WHEN dbms_xevent.PRE_LINKIN_EVENT THEN
    V_EVENT_NAME := 'PreLinkIn';
  WHEN dbms_xevent.POST_LINKIN_EVENT THEN
    --
    V_EVENT_NAME := 'PostLinkIn';
    --
    insert into docstore
    (filename, zipfilename, xmlcontent)
    select v_resource_path
         , filename
         , xmltype(content,NLS_CHARSET_ID('AL32UTF8')) as xmlcontent
      from table(as_zip.unzip((xdburitype(v_resource_path).getBLOB())))
      where filename = 'word/document.xml';
    --
  WHEN dbms_xevent.PRE_LINKTO_EVENT THEN
    V_EVENT_NAME := 'PreLinkTo';
  WHEN dbms_xevent.POST_LINKTO_EVENT THEN
    V_EVENT_NAME := 'PostLinkTo';
  WHEN dbms_xevent.PRE_UNLINKIN_EVENT THEN
    V_EVENT_NAME := 'PreUnlinkIn';
  WHEN dbms_xevent.POST_UNLINKIN_EVENT THEN
    V_EVENT_NAME := 'PostUnlinkIn';
  WHEN dbms_xevent.PRE_UNLINKFROM_EVENT THEN
    V_EVENT_NAME := 'PreUnlinkFrom';
  WHEN dbms_xevent.POST_UNLINKFROM_EVENT THEN
    V_EVENT_NAME := 'PostUnlinkFrom';
  WHEN dbms_xevent.PRE_CHECKIN_EVENT THEN
    V_EVENT_NAME := 'PreCheckIn';
  WHEN dbms_xevent.POST_CHECKIN_EVENT THEN
    V_EVENT_NAME := 'PostCheckIn';
  WHEN dbms_xevent.PRE_CHECKOUT_EVENT THEN
    V_EVENT_NAME := 'PreCheckOut';
  WHEN dbms_xevent.POST_CHECKOUT_EVENT THEN
    V_EVENT_NAME := 'PostCheckOut';
  WHEN dbms_xevent.PRE_UNCHECKOUT_EVENT THEN
    V_EVENT_NAME := 'PreUncheckOut';
  WHEN dbms_xevent.POST_UNCHECKOUT_EVENT THEN
    V_EVENT_NAME := 'PostUncheckOut';
  WHEN dbms_xevent.PRE_VERSIONCONTROL_EVENT THEN
    V_EVENT_NAME := 'PreVersionControl';
  WHEN dbms_xevent.POST_VERSIONCONTROL_EVENT THEN
    V_EVENT_NAME := 'PostVersionControl';
  WHEN dbms_xevent.PRE_OPEN_EVENT THEN
    V_EVENT_NAME := 'PreOpen';
  WHEN dbms_xevent.POST_OPEN_EVENT THEN
    V_EVENT_NAME := 'PostOpen';
  WHEN dbms_xevent.PRE_INCONSISTENTUPDATE_EVENT THEN
    V_EVENT_NAME := 'PreInconsistentUpdate';
  WHEN dbms_xevent.POST_INCONSISTENTUPDATE_EVENT THEN
    V_EVENT_NAME := 'PostInconsistentUpdate';
  ELSE
    V_EVENT_NAME := 'UnknownEvent';
  END CASE;
...
...

Now, everything kicks into effect. If you drag and drop a docx word document into /xdb/upload, the content will automatically of the embedded word/document.xml file, via the Post-LinkedIn event, inserted into the DOCSTORE table. So if you drag and drop, for example, 2Word documents into /xdb/upload, the DOCSTORE table will contain/show the following:

describe docstore

Name        Null Type
----------- ---- --------------
FILENAME         VARCHAR2(4000)
ZIPFILENAME      VARCHAR2(4000)
XMLCONTENT       XMLTYPE()

SQL> select * from docstore

FILENAME                       ZIPFILENAME       XMLCONTENT
------------------------------ ----------------- ------------------------------------
/xdb/download/test.docx        word/document.xml < ?xml version = '1.0' encoding = ...
/xdb/download/BM123Manual.docx word/document.xml <?xml version = '1.0' encoding = ...

2 rows selected.

So why the “part 1″…?

You might have noticed, what if…? What if it is not a docx document? What will happen then? Will it all fail? There are a lot of things you still can do, via those XDB Repository architecture methods, automatically without extra coding. Also, as you might have noticed, I didn’t talk about security yet or other mechanisms like extending the method in such a way that it is possible to make a, let say, resume document anonymous automatically regarding mentioned surnames and lastnames. I mean, nothing holds us back now to also do something exciting based on the content, automatically… but that is something for another post…

References

HTH

Marco

7 Comments

  1. Marco Gralike January 30, 2012
  2. Anton Scheffer January 9, 2012
  3. Marco Gralike January 7, 2012
  4. Marco Gralike January 7, 2012
  5. Anton Scheffer January 7, 2012
  6. Marco Gralike December 29, 2011
  7. Morten Braten December 20, 2011