Utl_compress, gzip and zlib 20188367001

Utl_compress, gzip and zlib

Oracle has a a supplied package utl_compress, which can be used to compress and decompress data with PL/SQL. According to the documentation it uses the “Lempel-Ziv compression algorithme”, and “The output of the UTL_COMPRESS compressed data is compatible with gzip”. That means it’s following the RFC 1952 specs, RFC 1952. And that may be very useful (but I have never used it), but I need compression (I’m working on a PDF-generator in PL/SQL) and decompression (unwrapping wrapped PL/SQL!) in the zlib-format, RFC 1950. Both formats use the same algorithm, RFC 1951, but have different headers and trailers. So can utl_compress be used to for compressing/decompressing data according to the zlib-specs. Yes! In fact all you need is how to calculate the Adler32-checksum, which is used as the trailer of the zlib-format.

  function adler32( p_src in blob )
  return varchar2
  is
    s1 pls_integer := 1;
    s2 pls_integer := 0;
  begin
    for i in 1 .. dbms_lob.getlength( p_src )
    loop
      s1 := mod( s1 + to_number( rawtohex( dbms_lob.substr( p_src, 1, i ) ), 'XX' ), 65521 );
      s2 := mod( s2 + s1, 65521);
    end loop;
    return to_char( s2, 'fm0XXX' ) || to_char( s1, 'fm0XXX' );
  end;

With this Adler32 function compressing it zlib-format is nothing more as a simple wrapper for utl_compress.lz_compress

  function zlib_compress( p_src in blob )
  return blob
  is
    t_tmp blob;
    t_cpr blob;
  begin
    t_tmp := utl_compress.lz_compress( p_src );
    dbms_lob.createtemporary( t_cpr, false );
    t_cpr := hextoraw( '789C' ); -- zlib header
    dbms_lob.copy( t_cpr, t_tmp, dbms_lob.getlength( t_tmp ) - 10 - 8, 3, 11 );
    dbms_lob.append( t_cpr, hextoraw( adler32( p_src ) ) ); -- zlib trailer
    dbms_lob.freetemporary( t_tmp );
    return t_cpr;
  end;

Unwrapping the zlib-format with utl_compress takes just a little more effort. Utl_compress.lz_decompress only works if the input has a correct trailer, which includes a crc32-checksum of the uncompressed data. And that’s something we don’t have and can’t calculate with the data we have. But utl_compress has another procedure lz_uncompress_extract, which can be used to decompress the input byte for byte. And that works without the correct trailer. But only until the last byte of the uncompressed data. If we try to get the last decompressed byte an exception ORA-29294: A data error occurred during compression or uncompression, is raised. But the value of this last byte can derived from the Adler32 checksum we have!

  function zlib_decompress( p_src in blob )
  return blob
  is
    t_out blob;
    t_tmp blob;
    t_buffer raw(1);
    t_hdl binary_integer;
    t_s1 pls_integer; -- s1 part of adler32 checksum
    t_last_chr pls_integer;
  begin
    dbms_lob.createtemporary( t_out, false );
    dbms_lob.createtemporary( t_tmp, false );
    t_tmp := hextoraw( '1F8B0800000000000003' ); -- gzip header
    dbms_lob.copy( t_tmp, p_src, dbms_lob.getlength( p_src ) - 2 - 4, 11, 3 );
    dbms_lob.append( t_tmp, hextoraw( '0000000000000000' ) ); -- add a fake trailer
    t_hdl := utl_compress.lz_uncompress_open( t_tmp );
    t_s1 := 1;
    loop
      begin
        utl_compress.lz_uncompress_extract( t_hdl, t_buffer );
      exception
        when others
        then
          exit;
      end;
      dbms_lob.append( t_out, t_buffer );
      t_s1 := mod( t_s1 + to_number( rawtohex( t_buffer ), 'xx' ), 65521 );
    end loop;
    t_last_chr := to_number( dbms_lob.substr( p_src, 2, dbms_lob.getlength( p_src ) - 1 ), '0XXX')
                - t_s1;
    if t_last_chr < 0
    then
      t_last_chr := t_last_chr + 65521;
    end if;
    dbms_lob.append( t_out, hextoraw( to_char( t_last_chr, 'fm0X' ) ) );
    if utl_compress.isopen( t_hdl )
    then
      utl_compress.lz_uncompress_close( t_hdl );
    end if;
    dbms_lob.freetemporary( t_tmp );
    return t_out;
  end;

Anton
sql script
package with zip and unzip

** Changelog:
** Changelog:
** Date: 04-08-2016
** fixed endless loop for empty/null zip file
** Date: 28-07-2016
** added support for defate64 (this only works for zip-files created with 7Zip)
** Date: 31-01-2014
** file limit increased to 4GB
** Date: 29-04-2012
** fixed bug for large uncompressed files, thanks Morten Braten
** Date: 21-03-2012
** Take CRC32, compressed length and uncompressed length from
** Central file header instead of Local file header
** Date: 17-02-2012
** Added more support for non-ascii filenames
** Date: 25-01-2012
** Added MIT-license

12 Comments

  1. ibre5041 February 17, 2012
  2. orodri November 18, 2011
    • Anton Scheffer November 19, 2011
  3. William Robertson October 28, 2011
  4. Anton Scheffer October 28, 2011
  5. William Robertson October 28, 2011
  6. Anton Scheffer October 27, 2011
  7. William Robertson October 27, 2011
  8. Peter October 11, 2011
    • Anton Scheffer October 11, 2011
  9. Marco March 14, 2010
  10. Laurent Schneider March 14, 2010