Updateable External Tables

Introduction

External tables were first introduced in Oracle 9i. They provide easy access, by means of SQL, to data stored in files on the file system (i.e.: outside the database). The developer only has to create a table definition, which specifies the internal structure of the file. This specification is based on the SQL*Loader API. A major limitation of external tables is the fact that they are read-only. The developer can select data, but cannot perform any form of DML-processing. In this post, we will show you a way that allows you to perform inserts, updates and deletes on your external tables. Before we describe our approach to solve this problem, we would like to warn you that this article only intends to demonstrate a combination of simple Oracle techniques that you can use to create a work-around for this limitation. Unless for the simplest of uses, we do not recommend using this technique for any serious application. External tables are read-only for a good reason, as we shall see below. Also, do not confuse updateable external tables with writeable external tables. The latter has become possible in Oracle 10g by means of the Data Pump API. For more details see for example the article on the Dizwell site.

How does it work?

We start of with a small introduction to external tables in general. The file that we want to base our external table on must reside in a server-side directory. This directory will also hold the dis, bad and log files. An alias for this directory is specified using the CREATE DIRECTORY command.

SQL> create directory EXT_TABLES as 'd:utl';

Directory created.

We can use the ALL_DIRECTORIES view to get an overview.

SQL> select * from all_directories;

OWNER      DIRECTORY_NAME        DIRECTORY_PATH
---------- --------------------- ----------------------------------------
SYS        EXT_TABLES            d:utl
SYS        MEDIA_DIR             D:oracle920demoschemaproduct_media

The developer needs read / write privileges in this directory.

SQL> grant read, write on directory EXT_TABLES to peter;

Grant succeeded.

We base our demo on the well-known EMP and DEPT tables. CSV-files holding the corresponding data are placed in the fore-mentioned directory. These files have the following content:

10,ANALYSIS,NEW YORK
20,RESEARCH,DALLAS
30,SALES,CHICAGO
40,OPERATIONS,BOSTON

and

7369,SMITH,CLERK,7902,17-DEC-80,800,,20
7499,ALLEN,SALESMAN,7698,20-FEB-81,1600,300,30
7521,WARD,SALESMAN,7698,22-FEB-81,1250,500,30
7566,JONES,MANAGER,7839,02-APR-81,2975,,20
7654,MARTIN,SALESMAN,7698,28-SEP-81,1250,1400,30
7698,BLAKE,MANAGER,7839,01-MAY-81,2850,,30
7782,CLARK,MANAGER,7839,09-JUN-81,2450,,10
7788,SCOTT,ANALYST,7566,19-APR-87,3000,,20
7839,KING,PRESIDENT,,17-NOV-81,5000,,10
7844,TURNER,SALESMAN,7698,08-SEP-81,1500,0,30
7876,ADAMS,CLERK,7788,23-MAY-87,1100,,20

We can then create our external table, as follows:

create table dept_ext_tab
( DEPTNO  NUMBER(2)
, DNAME   VARCHAR2(14)
, LOC     VARCHAR2(30)
)
organization external
( type oracle_loader
  default directory ext_tables
  access parameters ( records delimited by newline
                      badfile     'dept_ext_tab.bad'
                      discardfile 'dept_ext_tab.dis'
                      logfile     'dept_ext_tab.log'
                      fields terminated by ','
                      optionally enclosed by '"'
                      missing field values are null
                    )
  location ('dept_ext_tab.csv')
)
reject limit unlimited
/

create table emp_ext_tab
( empno     number(4)
, ename     varchar2(10)
, job       varchar2(9)
, mgr       number(4)
, hiredate  date
, sal       number(7,2)
, comm      number(7,2)
, deptno    number(2)
)
organization external
( type oracle_loader
  default directory ext_tables
  access parameters ( records delimited by newline
                      badfile     'emp_ext_tab.bad'
                      discardfile 'emp_ext_tab.dis'
                      logfile     'emp_ext_tab.log'
                      fields terminated by ','
                      optionally enclosed by '"'
                      missing field values are null
                      ( empno
                      , ename
                      , job
                      , mgr
                      , hiredate    char date_format date mask "dd-MON-rr"
                      , sal
                      , comm
                      , deptno
                      )
                    )
  location ('emp_ext_tab.csv')
)
reject limit unlimited
/

Notice that for the EMP external table, we named all the columns, because of the need to specify the format for the date column. After creation, you can review all your external table definitions using the USER_EXTERNAL_TABLES view. Now that we have our tables, we can do a simple select, for example:

SQL> select   e.empno
  2         , e.ename
  3         , e.job
  4         , e.mgr
  5         , to_char(e.hiredate,'dd-MON-yyyy')  hiredate
  6         , e.sal
  7         , e.comm
  8         , e.deptno
  9         , d.dname
 10         , d.loc
 11  from     emp_ext_tab   e
 12         , dept_ext_tab  d
 13  where    e.deptno = d.deptno
 14  order by e.deptno
 15         , e.empno
 16  /

     EMPNO ENAME      JOB              MGR HIREDATE           SAL       COMM     DEPTNO DNAME          LOC
---------- ---------- --------- ---------- ----------- ---------- ---------- ---------- ------------   --------------
      7839 KING       PRESIDENT            17-NOV-1981       5100                    10 ANALYSIS       NEW YORK
      7469 SMITH      CLERK           7902 17-DEC-1980        900                    20 RESEARCH       DALLAS
      7788 SCOTT      ANALYST         7566 19-APR-1987       3050                    20 RESEARCH       DALLAS
      7876 ADAMS      CLERK           7788 23-MAY-1987       1200                    20 RESEARCH       DALLAS
      7877 NEWINSERT  CLERK           7788 01-APR-2005        800                    20 RESEARCH       DALLAS
      7976 ADAMS      CLERK           7788 23-MAY-1987                               20 RESEARCH       DALLAS
      7521 WARD       SALESMAN        7698 22-FEB-1981       1250        500         30 SALES          CHICAGO
      7621 WARD       SALESMAN        7698 22-FEB-1981       1350        500         30 SALES          CHICAGO
      7654 MARTIN     SALESMAN        7698 28-SEP-1981       1250       1400         30 SALES          CHICAGO
      7698 BLAKE      MANAGER         7839 01-MAY-1981       2850                    30 SALES          CHICAGO
      7754 MARTIN     SALESMAN        7698 28-SEP-1981       1350       1400         30 SALES          CHICAGO

11 rows selected.

External tables have limitations:

  • They are read-only which implies you cannot issue DML-statements.
  • They cannot be indexed which means creating a primary key is not possible.
  • They do not support foreign keys.
SQL> update emp_ext_tab
  2  set    sal    = 750
  3  where  empno  = 7976
  4  /
update emp_ext_tab
       *
ERROR at line 1:
ORA-30657: operation not supported on external organized table

SQL> alter table emp_ext_tab add
  2  ( constraint emp_ext_tab_self_key foreign key (MGR)
  3    references emp_ext_tab (EMPNO)
  4  )
  5  /
( constraint emp_ext_tab_self_key foreign key (MGR)
                                  *
ERROR at line 2:
ORA-30657: operation not supported on external organized table

SQL> alter table emp_ext_tab add
  2  ( constraint emp_ext_tab_foreign_key foreign key (DEPTNO)
  3    references dept_ext_tab (DEPTNO)
  4  )
  5  /
( constraint emp_ext_tab_foreign_key foreign key (DEPTNO)
                                     *
ERROR at line 2:
ORA-30657: operation not supported on external organized table

The trick we will use to make DML possible consists of three steps:

  1. Define a view on the external table.
  2. On this view, define INSTEAD OF triggers for insert, update and delete.
  3. Write PL/SQL code in these triggers to perform the required processing.

For practical purposes, we combine this code into a database package. We define our view as follows (we will only do DML on the EMP external table):

create or replace view emp_ext_tab_vw
as
  select       rownum         rownumber
         , emp.empno          empno
         , emp.ename          ename
         , emp.job            job
         , emp.mgr            mgr
         , emp.hiredate       hiredate
         , emp.sal            sal
         , emp.comm           comm
         , emp.deptno         deptno
  from     emp_ext_tab        emp
  order by rownumber asc
/

It is important to note that we have included the row number in the view. This row number corresponds to the actual line number in the physical file. The ordering on row number ascending is essential for keeping it that way. Next, we create the INSTEAD OF triggers against the view.

create or replace trigger emp_ext_tab_vw_brd
instead of delete on emp_ext_tab_vw
begin
  --
  emp_ext_tab_dml.delete_record ( :OLD.rownumber );
  --
end;
/

Please recall the following properties of INSTEAD OF triggers (see Oracle’s SQL manual for further details):

  • INSTEAD OF triggers are valid for DML events on views. They are not valid for DDL or database events.
  • If a view is inherently updatable and has INSTEAD OF triggers, then the triggers take preference. In other words, the database fires the triggers instead of performing DML on the view.
  • INSTEAD OF trigger statements are implicitly activated for each row.
  • You cannot specify trigger conditions for INSTEAD OF trigger statements.
  • You can read both the :OLD and the :NEW value, but you cannot write either the :OLD or the :NEW value.
  • INSTEAD OF triggers defined in a view are dropped when the view is re-created.

These properties are important for understanding the both basics and the limitations of our construction. Only the delete trigger is shown above, the insert and update triggers are very similar, and can be found in the accompanying zip-file. The procedures the triggers refer to are combined into a package that we have named emp_ext_tab_dml. This package is too large to show here in whole. It also has been included in the zip-file. The package essentially consists of 3 procedures (insert, update and delete), supplemented with a number of procedures for file handling (open, close, locking) and the formatting of a line.

  • The insert procedure takes the :NEW data values from the trigger as parameters, combines them into one line, and appends this line to the external table’s data file.
  • The delete procedure takes only the (:OLD) value of the row number to be deleted as a parameter. This value corresponds to the number of the line in the physical file. Since the UTL_FILE package that we use to do the file operations is not capable of manipulating single lines within a file, we copy all the lines in the data file to a temporary file, except for the line that corresponds to the row number parameter. When all the lines have been copied, we close both files and replace the external table’s data file with the temporary file. Note that the external table definition itself does not cause any file locks!
  • The update procedure takes both the :OLD value of the row number to be updated and the :NEW data values from the trigger as a parameter. Again, we copy all the lines in the data file to a temporary file. When the line number matches the row number parameter, we write a line with the new values. When all the lines have been written, we close both files and replace the external table’s data file with the temporary file.

The delete and update procedures make use of a cursor based on the view. The dataset retrieved from this cursor must be ordered ascending on row number as well, in order for our construction to work. Note: For clarity’s sake, we have made the package code as straightforward as possible. For example, we left out all but the most elementary exception handling. The functionality can be enhanced as required.

Results and limitations

Included in the zip-file are the data files, the create-scripts for the tables, the view, the triggers and the package, and a script that demonstrates the effect of it all (including its output listing). We suggest you take a look at the resulting output listing before you read any further. As it turns out, inserts, updates and a single delete work fine using our construction (see screenshots below). updateable external tables screenshot 1: demo of DML updateable external tables screenshot 2: DML results shown in Excel The problem is with doing a multiple delete (i.e. a delete statement that affects more than one row). As we noted before, the INSTEAD OF trigger is implicitly for each row. This means statement-level information is unavailable. Since the DML-statement determines the row numbers to delete at the statement level, and not at the row level, our procedure goes astray after the first record has been deleted. The first delete changes the numbering of lines in the physical file, so that subsequent calls to the delete procedure refer to the wrong record. Thus, we delete either the wrong records, or no records at all, or both (depending on the specific situation). We suspect (but have not investigated it further) that application of this construction in a form could be made to work, because the form-specific triggers do allow you to differentiate between the pre- and post-condition of the delete statement. Below, we list all limitations encountered:

  • We need to use a view in order to be able to define triggers.
  • Our construction bypasses any concept of a transaction. When the call to the corresponding procedure is successfully completed, the DML-statement is implicitly "committed". An equivalent for a rollback is not available.
  • In order to have the row numbers of the view correspond to the line numbers in the physical file, you must not do anything to upset this sequence.
    • Both the view and package cursor must be ordered ascending on row number.
    • The view must not have any restrictions (i.e.: no WHERE-clause).
    • Do not include header rows in your data file.
  • With regard to the checking of constraints:
    • The definition of an external table does not allow the null/not null clause. Therefore, when inserting or updating, the developer needs to check the whether the column is optional or mandatory.
    • It is not allowed to define a foreign key on an external table. Therefore, any referential integrity needs to be checked and maintained by the developer.
    • Any allowable values, business rules etc. need to be programmed explicitly by the developer.

    Note: The create scripts for tables, view, triggers and package as included in the zip-file could be generated by a separate procedure, provided some metadata is available about the name, location and layout of the data file. Any additional coding for table-specific checks could then be added afterwards.

  • Oracle automatically creates a log file (the name and location of which you can specify). This log file grows rapidly to considerable size.
  • Although we have not actually tested it, it doesn’t take a visionary to predict that our construction will perform poorly for larger files. The use of an external table inevitably causes some processing overhead. Our construction only worsens that situation. It is therefore not suited to manipulating large tables / files.

Conclusion

External tables as read-only objects have various uses. They appear to be especially interesting for ETL in a datawarehouse environment. To find a suitable application for updateable external tables is no trivial matter. One article we used mentions the use of (read-only) external tables “for shops where users can control systemwide parameters inside desktop spreadsheets and Oracle knows immediately about changes‿. Perhaps an updateable external table could be of relevance there as well. However, given the limitations of this technique, one should look very seriously at the consequences before applying it.

Acknowledgements

We would like to thank Lucas Jellema for his suggestion to investigate the possibility of updateable external tables.

Update April 13, 2005

In his comment of April 12, 2005, Anton Scheffer noted that Oracle most likely does not guarantee that the rownumber will always be assigned to the same record in the file, and sorting on the rownumber in the view won’t make much of a difference. He suggested adding the rownumber to the external table definition, and claims that this will handle bad and rejected records in the file as well. We have tested his suggestion, and have come to the following conclusions:

  • Including the rownumber in the external table means the package cursor can reference this rownumber directly, without having to rely on the select made by the view. Thus, the view no longer needs to be ordered. It can now also have restrictions (i.e.: a WHERE-clause).
  • The package cursor, although selecting directly from the external table, still needs to be ordered ascending on row number. This is necessary for handling an update statement that affects more than one record. For each record affected by the update, the external table file is rewritten. If the order in which this is done is changed to something other than ascending on rownumber, then the rownumbers to be updated (as determined at the pre-statement level) will no longer correspond to the actual line numbers.
  • Having header rows or rows of bad data in your file is no longer a problem. All DML-operations that are supported by this method still work fine. However, given the current implementation of the DML-package, the non-data records will disappear from the file the first time an update or delete statement is executed. After that, they are only available from the bad-file. One could change the implementation of the DML-package to compensate for this, but this will mean doing the whole processing with UTL_FILE instead of SQL. It would also mean we need to incorporate metadata about the file structure into the DML-package, other than just the structure of the data. This makes it a more complex and less generic approach, which is undesirable. Thus, we tend to conclude that it is still best to avoid the use of header rows.

The altered scripts are available in an updated version of the zipfile.

Update April 22, 2005

After we published these conclusions, Anton Scheffer noted in an additional personal e-mail that doing the whole processing with UTL_FILE is in fact quite simple. It allows you to do a multiple delete, and saves all bad records (headers, bad data etc.). Anton’s code and some testdata are available in an additional zipfile(by kind permission).

Again, we have investigated his work, and have noted the following. We started out on two major assumptions: "Do as much as you can with SQL, and as little as possible with UTL_FILE." and "Influence the original file as little as possible." The former explains our advice not to use header rows. The latter basically causes our restriction of not being able to do a multiple delete. Anton however has taken a different approach. He has added a piece of metadata about the file structure to the external table’s definition. More specifically: the "load when empno != blanks" clause tells Oracle to ignore any blank lines in the file. This allows him to implement a delete as an update to a blank line, thus preserving the match between pre-statement and actual line numbers. Hence, a mutiple delete can be processed correctly. In addition, he no longer uses a cursor to loop through the data records. Instead, he loops directly through the file, which allows him to preserve any record with bad data. The result of a delete statement is now a data file with one or more blank lines in it. In our example, MS-Excel is still able to open this file. However, one can easily imagine an application that is not able to handle blank lines in its inputfile, or that interprets a blank line as an END-OF-FILE character. To sum up: In our approach you stay close to the original file layout, with the limitation of not being able to do a multiple delete. In Anton’s approach, you can do all the DML you want, at the expense of altering the file structure. It is up to you to decide which approach is the preferable one in your situation.

 

13 Comments

  1. bvk June 14, 2007
  2. Big Daddy Kool Maish December 22, 2006
  3. Guna February 22, 2006
  4. Naveed Ahmed Siddiqui November 9, 2005
  5. GD October 24, 2005
  6. Upendra August 24, 2005
  7. Lucas June 29, 2005
  8. surendra June 29, 2005
  9. Bala June 18, 2005
  10. larry May 21, 2005
  11. Bhashetty May 20, 2005
  12. Anton Scheffer April 12, 2005
  13. Aino Andriessen April 5, 2005