Parsing a Microsoft Word docx, and unzip zipfiles, with PL/SQL
Some days ago a collegue of mine asked if I could made something for him to unzip a Microsoft Word 2007 docx file. And of course in the database and without using Java.
As it turns out, a docx file is just a ordinary zipfile, with some xml-files stored in it. And because I already had build a little procedure to make zipfiles some weeks ago it didn’t took me not more than 3 hours to build a package to unzip a zipfile from PL/SQL.
With this package you can get list of all the files in a zipfile, and unzip a file if you want. And if you know a little xml you can query the text form your Word document.
As you can see the text is shown twice, I didn’t put time in trying to understand the Word format. I leave that to somebody else.
** Date: 29-04-2012
** fixed bug for large uncompressed files, thanks Morten Braten
** Date: 21-03-2012
** Take CRC32, compressed length and uncompressed length from
** Central file header instead of Local file header
** Date: 17-02-2012
** Added more support for non-ascii filenames
** Date: 25-01-2012
** Added MIT-license
- Utl_compress, gzip and zlib
- Writing a Word Search puzzle solver in SQL
- Building an RSS Feed Reader in PL/SQL – using dbms_xmlparser, dbms_xmldom and dbms_xslprocessor for parsing and transforming
- Converting Word documents to XSL-FO (and onwards to PDF)
- SELECT * FROM RSS_FEED – querying feeds in SQL using Table Functions and XML parsing
- Steven Feuerstein Master Class. Anti-Pattern PL/SQL Programming + 12c New PL/SQL Features, December 12 and 13 2013
- Oracle Database 12c: PL/SQL package UTL_CALL_STACK for programmatically inspecting the PL/SQL Call Stack
- Read an Excel xlsx with PL/SQL
- The Very Very Latest in Database Development – slides from the Expertezed presentation
- The APEX of Business Value… or: the Business Value of APEX? Cloud takes Oracle APEX to new heights!
- Kom kennismaken met AMIS en doe mee met uitdagende projecten
- OOW 2012: The Very Very Latest in Database Development (CON4792)
- Oracle RDBMS 10GR1: solution to avoid character encoding in XML with UPDATEXML
- Reduce occurrence of ORA-04068 while upgrading PL/SQL packages by moving global variables to Application Context
- Select a blob across a database link, without getting ORA-22992