Read an Excel xlsx with PL/SQL

74
Share this on .. Tweet about this on Twitter3Share on LinkedIn2Share on Facebook2Share on Google+8Email this to someoneShare on TumblrBuffer this page

At the OTN SQL and PLSQL forum I promised to publish some code I use for a project I’ still working on. This code allows you to select the content from an Excel document

select * from table( as_read_xlsx.read( as_read_xlsx.file2blob( 'DOC', 'Book1.xlsx' ) ) );

  SHEET_NR SHEET_NAME ROW_NR COL_NR CELL  CEL STRING_VAL NUMBER_VAL DATE_VAL
---------- ---------- ------ ------ ----- --- ---------- ---------- --------------------------
         1 Mijn naam       1      1 A1    N                      11
         1 Mijn naam       1      2 B1    N                      12
         1 Mijn naam       1      3 C1    N                      13
         1 Mijn naam       2      1 A2    N                      21
         1 Mijn naam       2      2 B2    N                      22
         1 Mijn naam       2      3 C2    N                      23
         1 Mijn naam       3      1 A3    N                      31
         1 Mijn naam       3      2 B3    N                      32
         1 Mijn naam       3      3 C3    N                      33
         1 Mijn naam       4      4 D4    S   D4
         1 Mijn naam       6      2 B6    D
         1 Mijn naam       7      2 B7    D
         1 Mijn naam       8      2 B8    D
         1 Mijn naam       9      2 B9    D
         1 Mijn naam      10      2 B10   D
         2 Sheet3          2      2 B2    S   Test
         2 Sheet3          3      3 C3    D                         19-JAN-2013 20:17:00
         2 Sheet3          4      1 A4    S   Anton

18 rows selected.

So here it is

Anton

** Changelog:
** 18-02-2013 – Ralph Bieber
Handle cell type “str” to prevent ORA-06502
if cell content is a string calculated by formula,
then cell type is “str” instead of “s” and value is inside tag
** 19-02-2013 – Ralph Bieber
Add column formula in tp_one_cell record, to show, if value is calculated by formula
** 20-02-2013 – Anton Scheffer
Handle cell types ‘inlineStr’ and ‘e’ to prevent ORA-06502
** 19-03-2013 – Anton Scheffer
Support for formatted and empty strings
Handle columns per row to prevent ORA-31186: Document contains too many nodes
** 12-06-2013 – Anton Scheffer
Handle sharedStrings.xml on older Oracle database versions
** 18-09-2013 – Anton Scheffer
** 18-09-2013 – Anton Scheffer
Fix for LPX-00200 could not convert from encoding UTF-8 to …
(Note, this is an error I can’t reproduce myself, maybe depending on database version and characterset)
Thank you Stanislav Safonov for this solution
Handle numbers with scientific notation
** 20-01-2014 – Anton Scheffer
Fix for a large number (60000+) of strings
** 16-05-2014 – Anton Scheffer
round to 15 digits

Share this on .. Tweet about this on Twitter3Share on LinkedIn2Share on Facebook2Share on Google+8Email this to someoneShare on TumblrBuffer this page

About Author

Oracle Consultant at AMIS

74 Comments

  1. @Anton – Thanks for the package. It is parsing the Excel file as expected.

    When I run the given query in SQL developer, after returning few 100 rows it hits “ORA-06502: PL/SQL: numeric or value error: character string buffer too small” at line 296 of the package.

    Can you please help?

    • Anton Scheffer on

      @Sagay. The max size of a varchar2 in a SQL-query is 4000 characters. If your excel contains larger strings you will get that error. I don’t handle that in purpose. If you do want to truncate those values, just change that line to t_one_cell.string_val := substr( t_strings( to_number( t_val ) ), 1, 4000 );

  2. Hi, this is great package, but I have one problem.

    When column in excel file contains only column name (in first row) and rest of values in taht column contaions NULLs, package will read only column name and maybe one more row with Null value.
    It will not read other NULL’s values from that column.

    How can it be fixed?

    • Anton Scheffer on

      In my opinion that doesn’t need to be fixed. The package reads values stored in a Excel file. If a column only contains a “column name”, that means there’s nothing stored in the other rows (for that column). And that means no output from this package. But if you know that the Excel contains a matrix, and you expect null values for empty cells, YOU can change the query to add those values, outer joining it to something for instance. But feel free to change to package yourself to fix it.

    • Here the way I proceed the Excel once loaded. I use this to rebuild the matrix to its original state using a single SQL query.
      I dump the excel into a Global Temporary table for faster access, detect the “header” row (it may not be the first row) and rebuild my matrix with a query similar to:

      SELECT MAX(DECODE(col_nr, 1, TRIM(string_val), NULL)) COL_A,
      MAX(DECODE(col_nr, 2, TRIM(string_val), NULL)) COL_B
      FROM GLOBAL_TEMPORARY_TABLE
      WHERE SHEET_NAME = ‘My Sheet’
      AND ROW_NR > nFirstRowID
      GROUP BY ROW_NR;

      That way, even if the package do not read empty cell – the output you get include the blank column with NULL.
      Use “string_val”, “number_val” or “date_val” in the query depending with datatype you expect.

      Hope this will help

  3. Martin Goblet on

    Hi Anton

    Again, many thanks for this wonderful package (and for the as_zip too)

    Just a little correction, because I got a error on a huge string : added a substr(…, 1, 4000) on 2 lines

    IF t_t IN (‘str’, ‘inlineStr’, ‘e’) THEN
    t_one_cell.cell_type := ‘S';
    t_one_cell.string_val := SUBSTR(t_val, 1, 4000);
    ELSIF t_t = ‘s’ THEN
    t_one_cell.cell_type := ‘S';

    IF t_val IS NOT NULL THEN
    t_one_cell.string_val := SUBSTR(t_strings(TO_NUMBER(t_val)), 1, 4000);
    END IF;
    ELSE

    An other problem I noticed : when I run the procedure several times, it takes more time at each run. When I often run it on large xlsx files, I’ve to close my session and reopen it to start from scratch.

    I guess that some memory is not freed up (DBMS_LOB.freetemporary, DBMS_SESSION.free_unused_user_memory, …) somewhere ?

    Regards
    Martin

    • Martin,
      I experienced the elapsed time increase with successive runs as well. There is a memory leak because a call to dbms_xmldom.getdocumentelement() is used as a parameter in a call to dbms_xmldom.newdocument(). The allocated memory is not released.
      I made a fix for that by using a local variable t_ndoc for the getdocumentelement() call and then explicitly freeing the memory.

      See my post on Oct 11, 2013.

      -–return dbms_xmldom.makenode( dbms_xmldom.getdocumentelement( –dbms_xmldom.newdomdocument( xmltype( p_blob, nls_charset_id( ‘AL32UTF8′ ) ) ) ) );

      t_ndoc := dbms_xmldom.newdomdocument( xmltype( p_blob, nls_charset_id( ‘AL32UTF8′ ) ) );
      return dbms_xmldom.makenode( dbms_xmldom.getdocumentelement( t_ndoc ) );

      • Martin Goblet on

        Andrew

        I didn’t see you post before :( Tried your solution, but the elapsed time still increases (a little bit less now) by each run. Could it be possible that the BLOB parameters memory is never reset (p_xlsx, p_blob, p_zipped_blob) ?

        Regards
        Martin

        • I haven’t noticed any continued degradation after that fix but it may be possible the memory for the blobs are not released as well. Perhaps my files are not large enough to notice a difference.

          There is this line which may have a similar effect since it is using a function to pass as a parameter.

          t_nd := blob2node( get_file( p_xlsx, ‘xl/workbook.xml’ ) );

          Perhaps setting the get_file() call to a local variable first will have an effect but that is just speculation right now. You may have to search through the code to find instances like that and try it out.

  4. Steven Holley on

    Anton,
    This is an excellent package that has saved me countless hours. I have only run into one problem. When reading values from the spreadsheet using the following command – select *
    from table( as_read_xlsx.read( as_read_xlsx.file2blob( gv_directory, p_filename ) ) ) – if there are two consecutive null cells in the row, the second one does not show up in the returned data set. Is there any way to fix this?

    Thanks for your time
    Steve

  5. Hi Anton

    Many thanks for this code, it saved me a lot of time.

    Just 2 points where I’ve some trouble with.
    – Numbers appears like this 0.82280000000000009 instead of 0.8228. There are some false decimals at the end. Workaround I use is a round(), because I know the expected decimals in this field.
    – I’ve a worksheet (link placed in the website) with a formula in a column (for each row). I only get the formula from the first row, the others are null.

    Regards
    Martin

    • Anton Scheffer on

      @Martin Goblet
      I’ve added rounding to the package.
      The formula’s are empty because it are “shared” formula’s. Something I didn’t know about before, and that’s not handled (yet)

  6. Nicolas DENIS on

    Hello,

    I did do a quick improvement to the code.
    When you put by mistake a large file (did test with a 500K text file) that is NOT a excel file, the get_file take pretty long time to return nothing.

    I did add this at the beginning of the get_file function:
    BEGIN
    IF dbms_lob.substr(p_zipped_blob, 4, 1) = hextoraw(‘504B0304′) THEN
    ….
    END IF;
    RETURN NULL;
    END;

    To test the ZIP Header signature. If not a ZIP file, I simply ignore the request. (I could have add an exception but was not required).

    I did also add a function to test if the file look like a valid Excel file:

    FUNCTION testFile(p_xlsX BLOB) RETURN VARCHAR2 IS
    cTest BLOB;
    BEGIN
    cTest := get_file(p_xlsx, ‘xl/workbook.xml’);
    IF cTest IS NULL THEN
    RETURN ‘N';
    ELSE
    RETURN ‘Y';
    END IF;
    END;

    Otherwise, Thanks for this useful piece of code!

    • Anton Scheffer on

      @Krishna Santosh
      Make sure your directory object points to an existing directory on the file system, you have enough right to use it, you have enough rights on the file system and you have an excel file at that place.

  7. I’ve been using the XML storage facility of Excel for some time (going back to 2003). It works, too bad Oracle is not delivering a more extensive library set for PL/SQL when compared to Java or .NET. For integration of Excel and Oracle we’ve created an add-in to Excel named Invantive Control. Works the other way around: the user uploads (and downloads) all Excel data to/from a database or data vault. In that way, he can throw away the Excel sheet when done :-)

  8. when I try to open my xlsx file that has 27,916 lines is presenting the error

    ORA-04030: out of process memory When trying to allocate 16,408 bytes

    How could I fix this?

    thank you

  9. It would be possible to show the select in the same way that I see in excel, ie columns and data?

    for example…..

    select * from table( as_read_xlsx.read( as_read_xlsx.file2blob( ‘DOC’, ‘test.xlsx’ ), ‘Sheet1′ ) )

    …. and show as below….

    table_cod table_name product unit description price
    1 name 1 166 CX test 46,56
    2 name 2 203 CX second test name 63,06

    thank you!

      • Martin Goblet on

        I do it this way :

        SELECT MAX(DECODE(col_nr, 1, SUBSTR(string_val, 1, 10), ”)) cod_taric
        , MAX(DECODE(col_nr, 1, SUBSTR(string_val, 12, 2), ”)) genre
        , MAX(DECODE(col_nr, 2, date_val, ”)) dat_valid_from
        , MAX(DECODE(col_nr, 3, string_val, ”)) utilisable
        FROM AS_READ…
        WHERE sheet_nr = 1 AND row_nr > 1 — Void description line
        GROUP BY sheet_nr, row_nr
        ORDER BY sheet_nr, row_nr;

  10. Wondering if you can assist? I’ve found this package extremely useful, however some XLSX files are producing the following error:

    ORA-31011: XML parsing failed
    ORA-19202: Error occurred in XML processing
    LPX-00200: could not convert from encoding UTF-8 to ISO-8859-1
    Error at line 257
    ORA-06512: at “SYS.XMLTYPE”, line 265
    ORA-06512: at “CSDPT.AS_READ_XLSX”, line 38
    ORA-06512: at “CSDPT.AS_READ_XLSX”, line 191
    ORA-06512: at line 1
    31011. 00000 – “XML parsing failed”
    *Cause: XML parser returned an error while trying to parse the document.
    *Action: Check if the document to be parsed is valid.

    As best as I can tell, it must be encountering invalid characters and killing the parser, however I do not have any control over the source XLSX file in order to see it remedied. Is there a method whereby the package can incorporate an error trap for this?

  11. Ralph Bieber on

    I ran into error ORA-06502, when parsing a number with exponential value.
    Cell content was 3.3013052861106202E+17 (number 330130528611062020)

    Following format string in the TO_NUMBER conversion raised the error:
    t_nr := to_number( t_val, translate( t_val, ‘.0123456789,-+’, ‘D9999999999′ ), ‘NLS_NUMERIC_CHARACTERS=.,’ );

    I replaced by:
    t_nr := to_number(t_val, RegExp_Replace(translate(t_val, ‘.0123456789,+-‘, ‘D9999999999′ ),'(E).*$’,’EEEE’,1,1,’i’), ‘NLS_NUMERIC_CHARACTERS=.,’ );

    • Anton Scheffer on

      Or you could use
      t_nr := to_number( t_val
      , case when instr( t_val, ‘E’ ) = 0
      then translate( t_val, ‘.012345678,-+’, ‘D999999999′ )
      else translate( substr( t_val, 1, instr( t_val, ‘E’ ) – 1 ), ‘.012345678,-+’, ‘D999999999′ ) || ‘EEEE’
      end
      , ‘NLS_NUMERIC_CHARACTERS=.,’
      );

  12. Thanks a lot, It’s really fast !
    I had a trouble with some empty cells where your package found some values !! (it’s empty in Excel not in Oracle Result) After copy and paste entire sheet in a new one the problem gone. I think the values came from a previous action like ‘cut’ to this cell and the xlsx format keep it.

  13. Amazing solution, works great! Thank you very much for sharing it.

    I have a question though: it does not work with xls files (older Excel format, 2003), it only works with xlsx files. Would be possible to get it to work with xls as well?

    Regards,
    M. Rusu, Canada

  14. Andrew Markiewicz on

    Thanks for this package. It saved me a lot of work.
    However, there currently seems to be a memory leak using dbms_xmldom. I am on Oracle 11.2.0.3 so I thought this would have been fixed from 10g.

    I was able to find a fix for this with a minor alteration to your package in the read function. I created a local domdocument variable instead of the implicit document parameter to dbms_xmldom.getdocumentelement()

    ….
    t_ndoc dbms_xmldom.domdocument;

    function blob2node( p_blob blob )
    return dbms_xmldom.domnode
    is
    begin
    if p_blob is null or dbms_lob.getlength( p_blob ) = 0
    then
    return null;
    end if;
    t_ndoc := dbms_xmldom.newdomdocument( xmltype( p_blob, nls_charset_id( ‘AL32UTF8′ ) ) );
    –return dbms_xmldom.makenode( dbms_xmldom.getdocumentelement( dbms_xmldom.newdomdocument( xmltype( p_blob, nls_charset_id( ‘AL32UTF8′ ) ) ) ) );
    return dbms_xmldom.makenode( dbms_xmldom.getdocumentelement( t_ndoc ) );
    end;

    This allows us to explicitly free up the memory at the end of the pipelined function with dbms_xmldom.freedocument(). I also call dbms_session.free_unused_user_memory() for good measure but that may not be necessary.

    ……
    dbms_xmldom.freedocument(t_ndoc);
    dbms_session.free_unused_user_memory;
    return;
    end;

    Thanks again
    Andrew

  15. Hi Anto,
    Today i copied the package , it is selecting the infinity empty rows after the data has finished in the Excel sheet. Please help on that

  16. Hello, Anton!
    I’ve too stuck with same error, but your cure didn’t help. Is any way to get rid of those unrecognized simbols, or substitute something in place?
    Thank you for your package!

    • Anton Scheffer on

      @Stan
      It’s a problem I can’t reproduce on my database. So I’m only guessing what the cause could be, and what a solution might be. So, if the “cure” doesn’t help I have no idea how to solve it.
      Anton

  17. Hi. I need somw help to execute. Where is the directory located when I execute the following?

    select *
    from table( as_read_xlsx.read( as_read_xlsx.file2blob( ‘DOC’, ‘Book1.xlsx’ ) ) )

    I created the package in TOAD under my ID and I am trying to execute the select statement in TOAD. I get an error that it can’t find the directory (ORA-22285). Where is it looking for the directory? I tried using c:\dir name but that didn’t work either. Thanks Much.

    • Anton Scheffer on

      That example is looking at directory object DOC. And that directory object could be pointing at a directory some where at your Oracle DB server. Something that you have to set up your self.

  18. Jon Renaut on

    Thanks very much for sharing this – you’ve saved me a ton of time. I am, however, having some trouble with the code. It seems to be both doubling the sheets and doubling the cells – if I parse an xlsx with one sheet and one populated cell, I get four entries. Any ideas?

  19. I am trying to read an XLSX file:

    select *
    from table(as_read_xlsx.read(as_read_xlsx.file2blob(‘CALM_RESULTS’,’Closing_Final3.xlsx’),’Main Sheet’, ‘AF7′));

    and getting the following error:

    ERROR at line 2:
    ORA-31011: XML parsing failed
    ORA-19202: Error occurred in XML processing
    LPX-00200: could not convert from encoding UTF-8 to ISO-8859-1
    Error at line 2
    ORA-06512: at “SYS.XMLTYPE”, line 265
    ORA-06512: at “DATASTORE.AS_READ_XLSX”, line 38
    ORA-06512: at “DATASTORE.AS_READ_XLSX”, line 191
    ORA-06512: at line 1

    any ideas?

    Thanks,

    • Anton Scheffer on

      @michael
      Change the character set of your database? The error message says it all, the Excel contains some UTF-8 characters which can’t be handled by your database.
      You could try to handle it by changing this line
      t_nd := blob2node( get_file( p_xlsx, ‘xl/sharedStrings.xml’ ) );
      to
      declare
      t_blob blob;
      t_clob clob;
      t_dest_offset integer;
      t_src_offset integer;
      t_lang_context number := dbms_lob.default_lang_ctx;
      t_warning integer;
      XML_parsing_failed exception;
      PRAGMA EXCEPTION_INIT( XML_parsing_failed, -31011 );
      begin
      t_nd := blob2node( get_file( p_xlsx, ‘xl/sharedStrings.xml’ ) );
      exception
      when XML_parsing_failed
      then
      dbms_lob.createtemporary( t_clob, true);
      t_dest_offset := 1;
      t_src_offset := 1;
      dbms_lob.converttoclob( t_clob
      , get_file( p_xlsx, ‘xl/sharedStrings.xml’ )
      , dbms_lob.lobmaxsize
      , t_dest_offset
      , t_src_offset
      , nls_charset_id( ‘AL32UTF8′ )
      , t_lang_context
      , t_warning
      );
      dbms_lob.createtemporary( t_blob, true);
      t_dest_offset := 1;
      t_src_offset := 1;
      dbms_lob.converttoblob( t_blob
      , t_clob
      , dbms_lob.lobmaxsize
      , t_dest_offset
      , t_src_offset
      , nls_charset_id( ‘AL32UTF8′ )
      , t_lang_context
      , t_warning
      );
      t_nd := blob2node( t_blob );
      dbms_lob.freetemporary( t_blob );
      dbms_lob.freetemporary( t_clob );
      end;

  20. Anton thanks a lot for your help. With your help I am able to see the data in the string_val column by replacing pattern from ‘.’ to ‘*/text()’

    Thanks again.

  21. I think in my case this line is causing string_val to be empty.
    t_strings( i ) := dbms_xslprocessor.valueof( dbms_xmldom.item( t_nl, i ), ‘.’ );

    It looks like patron ‘.’ is incorrect because dbms_xslprocessor.valueof( dbms_xmldom.item( t_nl, i ), ‘.’ ); is returning NULL.

    Do you think there is some kind of version issue with XML related packages?

  22. Query is:
    select *
    from table( as_read_xlsx.read( as_read_xlsx.file2blob( ‘UPLOAD_DIR’, ‘Spring2013.xlsx’ ) ) )

    Query can read file and correctly list all columns except STRING_VAL (I am suing formula)

    I will send you .xlsx file in few minutes.

    Regards,

  23. Anton Scheffer on

    You could show me the query you are using or send me the excel you use
    scheffer @ “the company I work” .nl

  24. Thanks for quick reply. My bad I though I gave the necessary info :(.
    In my case whenever the output for column CELL_TYPE is S in these cases STRING_VAL column is blank.
    I tried to play with line below but no success.
    t_one_cell.string_val := t_strings( to_number( t_val ) );

    What info you are looking to suggest any possible reason.

    Thanks a lot.

  25. Hello Anton,

    This is great utility. In my case String valuses are not comming up (blank). Number are comming correctly. We are at office 2010 and db is 11.1.0.7.0 – 64bit.

    Any Idea?

    Thanks,

    Naseer – Houston

    • Anton Scheffer on

      @Naseer.
      No, I have no ideas. But it could be that you gave just to little information to work on :).

  26. Excellent, but when I try *xls file, I cannot read * xls file. How I read *xls file in PL-SQL? is it possible?
    Thanks

  27. Excellent,
    Since I am using oracle 10g, I cannot use the PIVOT function
    Are you going to implement a formatted output the same as it looks in the excel ?

    • Anton Scheffer on

      @ynixon
      No, I have no plans to change anything in this code. But if you know that your Excel has always the same number of columns its very easy to change the code yourself to return all Excel columns in one row.

      Anton

      • Anton,

        thanks for providing this great code for parsing XLSX files. I was looking a long time for exactly this.
        The code works well, except if you use formula inside cells.
        I changed the code just a little to be able to handle and show formulas.

        Are you interested in the changed package?

        kind regards

        Ralph

Leave a Reply