Comments on: Parsing a Microsoft Word docx, and unzip zipfiles, with PL/SQL https://technology.amis.nl/2010/06/09/parsing-a-microsoft-word-docx-and-unzip-zipfiles-with-plsql/ Friends of Oracle and Java Sat, 25 Apr 2015 11:36:47 +0000 hourly 1 http://wordpress.org/?v=4.2 By: Jason https://technology.amis.nl/2010/06/09/parsing-a-microsoft-word-docx-and-unzip-zipfiles-with-plsql/#comment-7396 Wed, 04 Sep 2013 05:27:22 +0000 http://technology.amis.nl/blog/?p=8090#comment-7396 Thanks for this awesome library. I have a zipped XML file stored in the database and I am using your package to extract the zip inline and then perform an xquery on it.

Example:
TO_DATE(EXTRACT(XMLTYPE(SCHEMA.BLOB_TO_CLOB(SCHEMA.as_zip.GET_FILE(F.PAYLOAD, REPLACE(F.FILENAME,’.zip’,’.xml’), null))),’//Transaction[@transactionID = “‘|| CRN.TRAN_ID ||'”]/CATSNotification/ChangeRequest/ChangeData/ActualChangeDate/text()’).getStringVal(),’YYYY-MM-DD’) AS ACTUALCHANGEDATE

Thanks again, Jason

]]>
By: Klaus Schuermann https://technology.amis.nl/2010/06/09/parsing-a-microsoft-word-docx-and-unzip-zipfiles-with-plsql/#comment-6162 Wed, 22 Feb 2012 08:01:26 +0000 http://technology.amis.nl/blog/?p=8090#comment-6162 Thank you for implementing these changes.

]]>
By: Anton Scheffer https://technology.amis.nl/2010/06/09/parsing-a-microsoft-word-docx-and-unzip-zipfiles-with-plsql/#comment-6161 Fri, 17 Feb 2012 14:58:43 +0000 http://technology.amis.nl/blog/?p=8090#comment-6161 I’ve changed the procedure add1file a little bit more to give it more support for non-ascii filenames.
procedure add1file
( p_zipped_blob in out blob
, p_name varchar2
, p_content blob
)
is
t_now date;
t_blob blob;
t_len integer;
t_clen integer;
t_crc32 raw(4) := hextoraw( '00000000' );
t_compressed boolean := false;
t_name raw(32767);
begin
t_now := sysdate;
t_len := nvl( dbms_lob.getlength( p_content ), 0 );
if t_len > 0
then
t_blob := utl_compress.lz_compress( p_content );
t_clen := dbms_lob.getlength( t_blob ) - 18;
t_compressed := t_clen < t_len;
t_crc32 := dbms_lob.substr( t_blob, 4, t_clen + 11 );
end if;
if not t_compressed
then
t_clen := t_len;
t_blob := p_content;
end if;
if p_zipped_blob is null
then
dbms_lob.createtemporary( p_zipped_blob, true );
end if;
t_name := utl_i18n.string_to_raw( p_name, 'AL32UTF8' );
dbms_lob.append( p_zipped_blob
, utl_raw.concat( c_LOCAL_FILE_HEADER -- Local file header signature
, hextoraw( '1400' ) -- version 2.0
, case when t_name = utl_i18n.string_to_raw( p_name, 'US8PC437' )
then hextoraw( '0000' ) -- no General purpose bits
else hextoraw( '0008' ) -- set Language encoding flag (EFS)
end
, case when t_compressed
then hextoraw( '0800' ) -- deflate
else hextoraw( '0000' ) -- stored
end
, little_endian( to_number( to_char( t_now, 'ss' ) ) / 2
+ to_number( to_char( t_now, 'mi' ) ) * 32
+ to_number( to_char( t_now, 'hh24' ) ) * 2048
, 2
) -- File last modification time
, little_endian( to_number( to_char( t_now, 'dd' ) )
+ to_number( to_char( t_now, 'mm' ) ) * 32
+ ( to_number( to_char( t_now, 'yyyy' ) ) - 1980 ) * 512
, 2
) -- File last modification date
, t_crc32 -- CRC-32
, little_endian( t_clen ) -- compressed size
, little_endian( t_len ) -- uncompressed size
, little_endian( utl_raw.length( t_name ), 2 ) -- File name length
, hextoraw( '0000' ) -- Extra field length
, t_name -- File name
)
);
if t_compressed
then
dbms_lob.copy( p_zipped_blob, t_blob, t_clen, dbms_lob.getlength( p_zipped_blob ) + 1, 11 ); -- compressed content
elsif t_clen > 0
then
dbms_lob.copy( p_zipped_blob, t_blob, t_clen, dbms_lob.getlength( p_zipped_blob ) + 1, 1 ); -- content
end if;
if dbms_lob.istemporary( t_blob ) = 1
then
dbms_lob.freetemporary( t_blob );
end if;
end
;

]]>
By: Klaus Schuermann https://technology.amis.nl/2010/06/09/parsing-a-microsoft-word-docx-and-unzip-zipfiles-with-plsql/#comment-6160 Thu, 16 Feb 2012 12:01:07 +0000 http://technology.amis.nl/blog/?p=8090#comment-6160 Hi Anton,
I’m using your zip package in Oracle 11g XE. It’s great and very useful.
But I had some problems with german umlauts in the filename.
In the zipfile the filename was cut. For each umlaut one or two characters are missing at the end.
I changed just 1 byte in your code adding a B for length in bytes and now it’s working:

procedure add1file

-> dbms_lob.append

/* -> little_endian( length( p_name ), 2 ) — File name length */
-> little_endian( lengthb( p_name ), 2 ) — File name length

Regards
Klaus
Klaus

]]>
By: Blank Names https://technology.amis.nl/2010/06/09/parsing-a-microsoft-word-docx-and-unzip-zipfiles-with-plsql/#comment-6159 Wed, 28 Sep 2011 14:15:32 +0000 http://technology.amis.nl/blog/?p=8090#comment-6159 it is useful, thank you

]]>
By: mangesh https://technology.amis.nl/2010/06/09/parsing-a-microsoft-word-docx-and-unzip-zipfiles-with-plsql/#comment-6158 Fri, 16 Sep 2011 08:45:09 +0000 http://technology.amis.nl/blog/?p=8090#comment-6158 Thank you very much for sharing

]]>
By: maxie https://technology.amis.nl/2010/06/09/parsing-a-microsoft-word-docx-and-unzip-zipfiles-with-plsql/#comment-6157 Fri, 06 May 2011 13:57:58 +0000 http://technology.amis.nl/blog/?p=8090#comment-6157 problems using add1file when modifying a docx containing a tiff image. procedure assumes compressed and word stores the file uncompressed.
Have tried modifying it but local header gets into trouble further down.? help!

]]>
By: tercüme https://technology.amis.nl/2010/06/09/parsing-a-microsoft-word-docx-and-unzip-zipfiles-with-plsql/#comment-6156 Sat, 26 Feb 2011 08:58:07 +0000 http://technology.amis.nl/blog/?p=8090#comment-6156 thank you very much for sharing. so unjust for word ???

]]>
By: docx to doc files https://technology.amis.nl/2010/06/09/parsing-a-microsoft-word-docx-and-unzip-zipfiles-with-plsql/#comment-6155 Wed, 17 Nov 2010 12:51:17 +0000 http://technology.amis.nl/blog/?p=8090#comment-6155 It’s amazing how complex docx and doc files can be.  I’ve tried to parse them with Python and they are quite difficult.  Our program does a unix conversion of docx to doc files in batch format.
Thanks for the post.

]]>
By: Anton Scheffer https://technology.amis.nl/2010/06/09/parsing-a-microsoft-word-docx-and-unzip-zipfiles-with-plsql/#comment-6154 Fri, 05 Nov 2010 08:38:32 +0000 http://technology.amis.nl/blog/?p=8090#comment-6154 The double text shown in my example is caused by a bug with XMLTYPE and blobs on my XE database.

]]>
By: Microblog https://technology.amis.nl/2010/06/09/parsing-a-microsoft-word-docx-and-unzip-zipfiles-with-plsql/#comment-6153 Tue, 22 Jun 2010 15:36:03 +0000 http://technology.amis.nl/blog/?p=8090#comment-6153 nice, thanks

]]>
By: moda https://technology.amis.nl/2010/06/09/parsing-a-microsoft-word-docx-and-unzip-zipfiles-with-plsql/#comment-6152 Sun, 13 Jun 2010 19:50:50 +0000 http://technology.amis.nl/blog/?p=8090#comment-6152 I could not  understand why the words appear twice. but I think of that.

]]>
By: prefabrik https://technology.amis.nl/2010/06/09/parsing-a-microsoft-word-docx-and-unzip-zipfiles-with-plsql/#comment-6151 Sat, 12 Jun 2010 12:35:53 +0000 http://technology.amis.nl/blog/?p=8090#comment-6151 Thanks you very much sir!…

]]>