Some days ago a collegue of mine asked if I could made something for him to unzip a Microsoft Word 2007 docx file. And of course in the database and without using Java.
As it turns out, a docx file is just a ordinary zipfile, with some xml-files stored in it. And because I already had build a little procedure to make zipfiles some weeks ago it didn’t took me not more than 3 hours to build a package to unzip a zipfile from PL/SQL.
With this package you can get list of all the files in a zipfile, and unzip a file if you want. And if you know a little xml you can query the text form your Word document.

Say you have a Word document like this

Then you can query the text from it like this

As you can see the text is shown twice, I didn’t put time in trying to understand the Word format. I leave that to somebody else.

Anton

And here’s a link with the used code: as_zip
package with zip and unzip