Utl_compress, gzip and zlib

Anton Scheffer March 13, 2010 Database, IT, PL/SQL 12 Comments

Oracle has a a supplied package utl_compress, which can be used to compress and decompress data with PL/SQL. According to the documentation it uses the “Lempel-Ziv compression algorithme”, and “The output of the UTL_COMPRESS compressed data is compatible with gzip”. That means it’s following the RFC 1952 specs, RFC 1952. And that may be very useful (but I have never used it), but I need compression (I’m working on a PDF-generator in PL/SQL) and decompression (unwrapping wrapped PL/SQL!) in the zlib-format, RFC 1950. Both formats use the same algorithm, RFC 1951, but have different headers and trailers. So can utl_compress be used to for compressing/decompressing data according to the zlib-specs. Yes! In fact all you need is how to calculate the Adler32-checksum, which is used as the trailer of the zlib-format.

function adler32( p_src in blob )
return varchar2
is
  s1 pls_integer := 1;
  s2 pls_integer := 0;
begin
  for i in 1 .. dbms_lob.getlength( p_src )
  loop
    s1 := mod( s1 + to_number( rawtohex( dbms_lob.substr( p_src, 1, i ) ), 'XX' ), 65521 );
    s2 := mod( s2 + s1, 65521);
  end loop;
  return to_char( s2, 'fm0XXX' ) || to_char( s1, 'fm0XXX' );
end;

With this Adler32 function compressing it zlib-format is nothing more as a simple wrapper for utl_compress.lz_compress

function zlib_compress( p_src in blob )
return blob
is
  t_tmp blob;
  t_cpr blob;
begin
  t_tmp := utl_compress.lz_compress( p_src );
  dbms_lob.createtemporary( t_cpr, false );
  t_cpr := hextoraw( '789C' ); -- zlib header
  dbms_lob.copy( t_cpr, t_tmp, dbms_lob.getlength( t_tmp ) - 10 - 8, 3, 11 );
  dbms_lob.append( t_cpr, hextoraw( adler32( p_src ) ) ); -- zlib trailer
  dbms_lob.freetemporary( t_tmp );
  return t_cpr;
end;

Unwrapping the zlib-format with utl_compress takes just a little more effort. Utl_compress.lz_decompress only works if the input has a correct trailer, which includes a crc32-checksum of the uncompressed data. And that’s something we don’t have and can’t calculate with the data we have. But utl_compress has another procedure lz_uncompress_extract, which can be used to decompress the input byte for byte. And that works without the correct trailer. But only until the last byte of the uncompressed data. If we try to get the last decompressed byte an exception ORA-29294: A data error occurred during compression or uncompression, is raised. But the value of this last byte can derived from the Adler32 checksum we have!

function zlib_decompress( p_src in blob )
return blob
is
  t_out blob;
  t_tmp blob;
  t_buffer raw(1);
  t_hdl binary_integer;
  t_s1 pls_integer; -- s1 part of adler32 checksum
  t_last_chr pls_integer;
begin
  dbms_lob.createtemporary( t_out, false );
  dbms_lob.createtemporary( t_tmp, false );
  t_tmp := hextoraw( '1F8B0800000000000003' ); -- gzip header
  dbms_lob.copy( t_tmp, p_src, dbms_lob.getlength( p_src ) - 2 - 4, 11, 3 );
  dbms_lob.append( t_tmp, hextoraw( '0000000000000000' ) ); -- add a fake trailer
  t_hdl := utl_compress.lz_uncompress_open( t_tmp );
  t_s1 := 1;
  loop
    begin
      utl_compress.lz_uncompress_extract( t_hdl, t_buffer );
    exception
      when others
      then
        exit;
    end;
    dbms_lob.append( t_out, t_buffer );
    t_s1 := mod( t_s1 + to_number( rawtohex( t_buffer ), 'xx' ), 65521 );
  end loop;
  t_last_chr := to_number( dbms_lob.substr( p_src, 2, dbms_lob.getlength( p_src ) - 1 ), '0XXX')
              - t_s1;
  if t_last_chr &lt; 0
  then
    t_last_chr := t_last_chr + 65521;
  end if;
  dbms_lob.append( t_out, hextoraw( to_char( t_last_chr, 'fm0X' ) ) );
  if utl_compress.isopen( t_hdl )
  then
    utl_compress.lz_uncompress_close( t_hdl );
  end if;
  dbms_lob.freetemporary( t_tmp );
  return t_out;
end;

Anton
sql script
package with zip and unzip

** Changelog:
** Changelog:
** Date: 04-08-2016
** fixed endless loop for empty/null zip file
** Date: 28-07-2016
** added support for defate64 (this only works for zip-files created with 7Zip)
** Date: 31-01-2014
** file limit increased to 4GB
** Date: 29-04-2012
** fixed bug for large uncompressed files, thanks Morten Braten
** Date: 21-03-2012
** Take CRC32, compressed length and uncompressed length from
** Central file header instead of Local file header
** Date: 17-02-2012
** Added more support for non-ascii filenames
** Date: 25-01-2012
** Added MIT-license

Tags:as_zip, plsql

About The Author

Anton Scheffer

Oracle Consultant at AMIS

12 Comments

ibre5041 February 17, 2012

Hi,
there are other ways how to compress a blob.
You can also use java stored procedure.
I had to use it because I spotted Oracle bug 7337717, 3861722. UTL_COMPRESS has very bad compression ratio sometimes. This bug it officially fixed in 11gR2.
create or replace and compile java source named “LobCompressor” as
import java.lang.*;
import oracle.sql.*;
import java.io.*;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
import java.util.zip.Deflater;

/**
* A simple class for LOB compression and decompression in Oracle Database. Will work in 8i and better.
*
* @author <a href=”mailto:pjarmuz@poczta.onet.pl”>Piotr Jarmuz</a>
*/
public class LobCompressor {
/**
* Compresses the BLOB into BLOB
*
* @param slob the source BLOB (plain binary data)
* @param blob the target BLOB (will hold compressed binary data) it should be an empty BLOB retrieved for example with dbms_lob.createtemporary(l_blob,true);
* @throws Exception mostly I/O exception if ever
*/
public static void compress(BLOB slob, BLOB blob) throws Exception {
InputStream in=slob.getBinaryStream();
GZIPOutputStream zout=new GZIPOutputStream(blob.setBinaryStream(0L))
{
{Â Â Â
def.setLevel(Deflater.BEST_COMPRESSION);
}
}
;Â Â Â

try
{
byte[] buffer=new byte[slob.getBufferSize()];

int cnt;
while ((cnt=in.read(buffer)) > 0) {
zout.write(buffer,0,cnt);
}
}
finally
{
in.close();
zout.finish();
zout.flush();
zout.close();
}
}

/**
* Decompresses the BLOB into CLOB
*
* @param blob the source BLOB (compressed binary data)
* @param slob the target CLOB (will hold plain binary data) it should be an empty CLOB retrieved for example with dbms_lob.createtemporary(l_blob,true);
* @throws Exception mostly I/O exception if ever
*/
public static void decompress(BLOB blob, BLOB slob) throws Exception {
GZIPInputStream zin = new GZIPInputStream(blob.getBinaryStream());
OutputStream out=slob.getBinaryOutputStream();

try
{
byte[] buffer=new byte[blob.getBufferSize()];

int cnt;
while ((cnt=zin.read(buffer)) > 0) {
out.write(buffer,0,cnt);
}
}
finally
{
zin.close();
out.flush();
out.close();
}
}
};
/
show errors
orodri November 18, 2011

hello!
I want to read 7zip files with LZMA format
it is posible with this code? plsql
- Anton Scheffer November 19, 2011
  
  @orodri
  I doubt it, but then again, I’ve never tried it.
William Robertson October 28, 2011

Awesome – now .3 seconds for same 27K blob. Thanks! I’d tried to do something similar with a second pass to sweep up the trailing section, but couldn’t get it to work.
Anton Scheffer October 28, 2011

As explained in the blog uncompressing has to be done byte for byte. You can speed it up by doing it twice! First with a large buffer until you fail, and a second time with a large buffer and stop just before the end and finish then byte for byte. Something like

function zlib_decompress( p_src in blob )
return blob
is
t_out blob;
t_tmp blob;
t_buf raw(8132);
t_buffer raw(1);
t_hdl binary_integer;
t_max_loops pls_integer;
t_s1 pls_integer; — s1 part of adler32 checksum
t_last_chr pls_integer;
begin
dbms_lob.createtemporary( t_out, true );
dbms_lob.createtemporary( t_tmp, true );
t_tmp := hextoraw( ‘1F8B0800000000000003’ ); — gzip header
dbms_lob.copy( t_tmp, p_src, dbms_lob.getlength( p_src ) – 2 – 4, 11, 3 );
dbms_lob.append( t_tmp, hextoraw( ‘0000000000000000’ ) ); — add a fake trailer
t_hdl := utl_compress.lz_uncompress_open( t_tmp );
t_s1 := 1;
t_max_loops := 0;
loop
begin
utl_compress.lz_uncompress_extract( t_hdl, t_buf );
dbms_lob.append( t_out, t_buf );
for i in 1 .. 8132
loop
t_s1 := mod( t_s1 + utl_raw.cast_to_binary_integer( utl_raw.substr( t_buf, i, 1 ) ), 65521 );
end loop;
t_max_loops := t_max_loops + 1;
exception
when others
then
if utl_compress.isopen( t_hdl )
then
utl_compress.lz_uncompress_close( t_hdl );
end if;
t_hdl := utl_compress.lz_uncompress_open( t_tmp );
exit;
end;
end loop;
for i in 1 .. t_max_loops
loop
utl_compress.lz_uncompress_extract( t_hdl, t_buf );
end loop;
loop
begin
utl_compress.lz_uncompress_extract( t_hdl, t_buffer );
exception
when others
then
exit;
end;
dbms_lob.append( t_out, t_buffer );
t_s1 := mod( t_s1 + to_number( rawtohex( t_buffer ), ‘xx’ ), 65521 );
end loop;
t_last_chr := to_number( dbms_lob.substr( p_src, 2, dbms_lob.getlength( p_src ) – 1 ), ‘0XXX’) – t_s1;
if t_last_chr < 0
then
t_last_chr := t_last_chr + 65521;
end if;
dbms_output.put_line( t_s1 || 'x' || t_last_chr );
dbms_lob.append( t_out, hextoraw( to_char( t_last_chr, 'fm0X' ) ) );
if utl_compress.isopen( t_hdl )
then
utl_compress.lz_uncompress_close( t_hdl );
end if;
dbms_lob.freetemporary( t_tmp );
return t_out;
end;
William Robertson October 28, 2011

Thanks! That brings it down to around a second. Is there a sensible way to increase the size ofÂ the raw buffer? That seemed to help as well but I get out of my depth rather quickly when it comes to the raw/hex/number/string conversions.
Anton Scheffer October 27, 2011

@William Robertson
If you want to make this code a lot faster, just change the parameter “cache” of the calls to dbms_lob.createtemporary to true. No idea why I set it to false.
William Robertson October 27, 2011

Pity it has to read one byte at a time – it’s taking about a minute for a 27K blob. I tried making t_buffer raw(2) and adjusting thecorrespondingÂ ‘XX’ formats and it’s twice as fast, but it loses the final character.
Peter October 11, 2011

I have a zip file in my blob field.. When I try to uncompress it i get error
ORA-06502: PL/SQL: numeric or value error: hex to raw conversion error
ORA-06512: at line 39
ORA-06512: at line 63
The problem is also when i like to unzip it in java.. I get: java.util.zip.ZipException: invalid entry size
I can normaly open the file with winzip.. or winrar..
Â
- Anton Scheffer October 11, 2011
  
  @Peter
  The zip-format supports a lot of different compression methods. With my plsql-code I support only two methods.
  And some extensions to the zip-format support “big-files”. That something I don’t handle too.
Marco March 14, 2010

Â
Laurent Schneider March 14, 2010

great post Anton

Share this:

Like this:

Related Posts

About The Author

Anton Scheffer

12 Comments