Updateable External Tables
External tables were first introduced in Oracle 9i. They provide easy access, by means of SQL, to data stored in files on the file system (i.e.: outside the database). The developer only has to create a table definition, which specifies the internal structure of the file. This specification is based on the SQL*Loader API. A major limitation of external tables is the fact that they are read-only. The developer can select data, but cannot perform any form of DML-processing. In this post, we will show you a way that allows you to perform inserts, updates and deletes on your external tables. Before we describe our approach to solve this problem, we would like to warn you that this article only intends to demonstrate a combination of simple Oracle techniques that you can use to create a work-around for this limitation. Unless for the simplest of uses, we do not recommend using this technique for any serious application. External tables are read-only for a good reason, as we shall see below. Also, do not confuse updateable external tables with writeable external tables. The latter has become possible in Oracle 10g by means of the Data Pump API. For more details see for example the article on the Dizwell site.
How does it work?
We start of with a small introduction to external tables in general. The file that we want to base our external table on must reside in a server-side directory. This directory will also hold the dis, bad and log files. An alias for this directory is specified using the CREATE DIRECTORY command.
SQL> create directory EXT_TABLES as 'd:utl'; Directory created.
We can use the ALL_DIRECTORIES view to get an overview.
SQL> select * from all_directories; OWNER DIRECTORY_NAME DIRECTORY_PATH ---------- --------------------- ---------------------------------------- SYS EXT_TABLES d:utl SYS MEDIA_DIR D:oracle920demoschemaproduct_media
The developer needs read / write privileges in this directory.
SQL> grant read, write on directory EXT_TABLES to peter; Grant succeeded.
We base our demo on the well-known EMP and DEPT tables. CSV-files holding the corresponding data are placed in the fore-mentioned directory. These files have the following content:
10,ANALYSIS,NEW YORK 20,RESEARCH,DALLAS 30,SALES,CHICAGO 40,OPERATIONS,BOSTON
7369,SMITH,CLERK,7902,17-DEC-80,800,,20 7499,ALLEN,SALESMAN,7698,20-FEB-81,1600,300,30 7521,WARD,SALESMAN,7698,22-FEB-81,1250,500,30 7566,JONES,MANAGER,7839,02-APR-81,2975,,20 7654,MARTIN,SALESMAN,7698,28-SEP-81,1250,1400,30 7698,BLAKE,MANAGER,7839,01-MAY-81,2850,,30 7782,CLARK,MANAGER,7839,09-JUN-81,2450,,10 7788,SCOTT,ANALYST,7566,19-APR-87,3000,,20 7839,KING,PRESIDENT,,17-NOV-81,5000,,10 7844,TURNER,SALESMAN,7698,08-SEP-81,1500,0,30 7876,ADAMS,CLERK,7788,23-MAY-87,1100,,20
We can then create our external table, as follows:
create table dept_ext_tab ( DEPTNO NUMBER(2) , DNAME VARCHAR2(14) , LOC VARCHAR2(30) ) organization external ( type oracle_loader default directory ext_tables access parameters ( records delimited by newline badfile 'dept_ext_tab.bad' discardfile 'dept_ext_tab.dis' logfile 'dept_ext_tab.log' fields terminated by ',' optionally enclosed by '"' missing field values are null ) location ('dept_ext_tab.csv') ) reject limit unlimited / create table emp_ext_tab ( empno number(4) , ename varchar2(10) , job varchar2(9) , mgr number(4) , hiredate date , sal number(7,2) , comm number(7,2) , deptno number(2) ) organization external ( type oracle_loader default directory ext_tables access parameters ( records delimited by newline badfile 'emp_ext_tab.bad' discardfile 'emp_ext_tab.dis' logfile 'emp_ext_tab.log' fields terminated by ',' optionally enclosed by '"' missing field values are null ( empno , ename , job , mgr , hiredate char date_format date mask "dd-MON-rr" , sal , comm , deptno ) ) location ('emp_ext_tab.csv') ) reject limit unlimited /
Notice that for the EMP external table, we named all the columns, because of the need to specify the format for the date column. After creation, you can review all your external table definitions using the USER_EXTERNAL_TABLES view. Now that we have our tables, we can do a simple select, for example:
SQL> select e.empno 2 , e.ename 3 , e.job 4 , e.mgr 5 , to_char(e.hiredate,'dd-MON-yyyy') hiredate 6 , e.sal 7 , e.comm 8 , e.deptno 9 , d.dname 10 , d.loc 11 from emp_ext_tab e 12 , dept_ext_tab d 13 where e.deptno = d.deptno 14 order by e.deptno 15 , e.empno 16 / EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO DNAME LOC ---------- ---------- --------- ---------- ----------- ---------- ---------- ---------- ------------ -------------- 7839 KING PRESIDENT 17-NOV-1981 5100 10 ANALYSIS NEW YORK 7469 SMITH CLERK 7902 17-DEC-1980 900 20 RESEARCH DALLAS 7788 SCOTT ANALYST 7566 19-APR-1987 3050 20 RESEARCH DALLAS 7876 ADAMS CLERK 7788 23-MAY-1987 1200 20 RESEARCH DALLAS 7877 NEWINSERT CLERK 7788 01-APR-2005 800 20 RESEARCH DALLAS 7976 ADAMS CLERK 7788 23-MAY-1987 20 RESEARCH DALLAS 7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 SALES CHICAGO 7621 WARD SALESMAN 7698 22-FEB-1981 1350 500 30 SALES CHICAGO 7654 MARTIN SALESMAN 7698 28-SEP-1981 1250 1400 30 SALES CHICAGO 7698 BLAKE MANAGER 7839 01-MAY-1981 2850 30 SALES CHICAGO 7754 MARTIN SALESMAN 7698 28-SEP-1981 1350 1400 30 SALES CHICAGO 11 rows selected.
External tables have limitations:
- They are read-only which implies you cannot issue DML-statements.
- They cannot be indexed which means creating a primary key is not possible.
- They do not support foreign keys.
SQL> update emp_ext_tab 2 set sal = 750 3 where empno = 7976 4 / update emp_ext_tab * ERROR at line 1: ORA-30657: operation not supported on external organized table SQL> alter table emp_ext_tab add 2 ( constraint emp_ext_tab_self_key foreign key (MGR) 3 references emp_ext_tab (EMPNO) 4 ) 5 / ( constraint emp_ext_tab_self_key foreign key (MGR) * ERROR at line 2: ORA-30657: operation not supported on external organized table SQL> alter table emp_ext_tab add 2 ( constraint emp_ext_tab_foreign_key foreign key (DEPTNO) 3 references dept_ext_tab (DEPTNO) 4 ) 5 / ( constraint emp_ext_tab_foreign_key foreign key (DEPTNO) * ERROR at line 2: ORA-30657: operation not supported on external organized table
The trick we will use to make DML possible consists of three steps:
- Define a view on the external table.
- On this view, define INSTEAD OF triggers for insert, update and delete.
- Write PL/SQL code in these triggers to perform the required processing.
For practical purposes, we combine this code into a database package. We define our view as follows (we will only do DML on the EMP external table):
create or replace view emp_ext_tab_vw as select rownum rownumber , emp.empno empno , emp.ename ename , emp.job job , emp.mgr mgr , emp.hiredate hiredate , emp.sal sal , emp.comm comm , emp.deptno deptno from emp_ext_tab emp order by rownumber asc /
It is important to note that we have included the row number in the view. This row number corresponds to the actual line number in the physical file. The ordering on row number ascending is essential for keeping it that way. Next, we create the INSTEAD OF triggers against the view.
create or replace trigger emp_ext_tab_vw_brd instead of delete on emp_ext_tab_vw begin -- emp_ext_tab_dml.delete_record ( :OLD.rownumber ); -- end; /
Please recall the following properties of INSTEAD OF triggers (see Oracleâ€™s SQL manual for further details):
- INSTEAD OF triggers are valid for DML events on views. They are not valid for DDL or database events.
- If a view is inherently updatable and has INSTEAD OF triggers, then the triggers take preference. In other words, the database fires the triggers instead of performing DML on the view.
- INSTEAD OF trigger statements are implicitly activated for each row.
- You cannot specify trigger conditions for INSTEAD OF trigger statements.
- You can read both the :OLD and the :NEW value, but you cannot write either the :OLD or the :NEW value.
- INSTEAD OF triggers defined in a view are dropped when the view is re-created.
These properties are important for understanding the both basics and the limitations of our construction. Only the delete trigger is shown above, the insert and update triggers are very similar, and can be found in the accompanying zip-file. The procedures the triggers refer to are combined into a package that we have named emp_ext_tab_dml. This package is too large to show here in whole. It also has been included in the zip-file. The package essentially consists of 3 procedures (insert, update and delete), supplemented with a number of procedures for file handling (open, close, locking) and the formatting of a line.
- The insert procedure takes the :NEW data values from the trigger as parameters, combines them into one line, and appends this line to the external tableâ€™s data file.
- The delete procedure takes only the (:OLD) value of the row number to be deleted as a parameter. This value corresponds to the number of the line in the physical file. Since the UTL_FILE package that we use to do the file operations is not capable of manipulating single lines within a file, we copy all the lines in the data file to a temporary file, except for the line that corresponds to the row number parameter. When all the lines have been copied, we close both files and replace the external tableâ€™s data file with the temporary file. Note that the external table definition itself does not cause any file locks!
- The update procedure takes both the :OLD value of the row number to be updated and the :NEW data values from the trigger as a parameter. Again, we copy all the lines in the data file to a temporary file. When the line number matches the row number parameter, we write a line with the new values. When all the lines have been written, we close both files and replace the external tableâ€™s data file with the temporary file.
The delete and update procedures make use of a cursor based on the view. The dataset retrieved from this cursor must be ordered ascending on row number as well, in order for our construction to work. Note: For clarityâ€™s sake, we have made the package code as straightforward as possible. For example, we left out all but the most elementary exception handling. The functionality can be enhanced as required.
Results and limitations
Included in the zip-file are the data files, the create-scripts for the tables, the view, the triggers and the package, and a script that demonstrates the effect of it all (including its output listing). We suggest you take a look at the resulting output listing before you read any further. As it turns out, inserts, updates and a single delete work fine using our construction (see screenshots below). The problem is with doing a multiple delete (i.e. a delete statement that affects more than one row). As we noted before, the INSTEAD OF trigger is implicitly for each row. This means statement-level information is unavailable. Since the DML-statement determines the row numbers to delete at the statement level, and not at the row level, our procedure goes astray after the first record has been deleted. The first delete changes the numbering of lines in the physical file, so that subsequent calls to the delete procedure refer to the wrong record. Thus, we delete either the wrong records, or no records at all, or both (depending on the specific situation). We suspect (but have not investigated it further) that application of this construction in a form could be made to work, because the form-specific triggers do allow you to differentiate between the pre- and post-condition of the delete statement. Below, we list all limitations encountered:
- We need to use a view in order to be able to define triggers.
- Our construction bypasses any concept of a transaction. When the call to the corresponding procedure is successfully completed, the DML-statement is implicitly "committed". An equivalent for a rollback is not available.
- In order to have the row numbers of the view correspond to the line numbers in the physical file, you must not do anything to upset this sequence.
- Both the view and package cursor must be ordered ascending on row number.
- The view must not have any restrictions (i.e.: no WHERE-clause).
- Do not include header rows in your data file.
- With regard to the checking of constraints:
- The definition of an external table does not allow the null/not null clause. Therefore, when inserting or updating, the developer needs to check the whether the column is optional or mandatory.
- It is not allowed to define a foreign key on an external table. Therefore, any referential integrity needs to be checked and maintained by the developer.
- Any allowable values, business rules etc. need to be programmed explicitly by the developer.
Note: The create scripts for tables, view, triggers and package as included in the zip-file could be generated by a separate procedure, provided some metadata is available about the name, location and layout of the data file. Any additional coding for table-specific checks could then be added afterwards.
- Oracle automatically creates a log file (the name and location of which you can specify). This log file grows rapidly to considerable size.
- Although we have not actually tested it, it doesnâ€™t take a visionary to predict that our construction will perform poorly for larger files. The use of an external table inevitably causes some processing overhead. Our construction only worsens that situation. It is therefore not suited to manipulating large tables / files.
External tables as read-only objects have various uses. They appear to be especially interesting for ETL in a datawarehouse environment. To find a suitable application for updateable external tables is no trivial matter. One article we used mentions the use of (read-only) external tables â€œfor shops where users can control systemwide parameters inside desktop spreadsheets and Oracle knows immediately about changesâ€¿. Perhaps an updateable external table could be of relevance there as well. However, given the limitations of this technique, one should look very seriously at the consequences before applying it.
We would like to thank Lucas Jellema for his suggestion to investigate the possibility of updateable external tables.
Update April 13, 2005
In his comment of April 12, 2005, Anton Scheffer noted that Oracle most likely does not guarantee that the rownumber will always be assigned to the same record in the file, and sorting on the rownumber in the view won’t make much of a difference. He suggested adding the rownumber to the external table definition, and claims that this will handle bad and rejected records in the file as well. We have tested his suggestion, and have come to the following conclusions:
- Including the rownumber in the external table means the package cursor can reference this rownumber directly, without having to rely on the select made by the view. Thus, the view no longer needs to be ordered. It can now also have restrictions (i.e.: a WHERE-clause).
- The package cursor, although selecting directly from the external table, still needs to be ordered ascending on row number. This is necessary for handling an update statement that affects more than one record. For each record affected by the update, the external table file is rewritten. If the order in which this is done is changed to something other than ascending on rownumber, then the rownumbers to be updated (as determined at the pre-statement level) will no longer correspond to the actual line numbers.
- Having header rows or rows of bad data in your file is no longer a problem. All DML-operations that are supported by this method still work fine. However, given the current implementation of the DML-package, the non-data records will disappear from the file the first time an update or delete statement is executed. After that, they are only available from the bad-file. One could change the implementation of the DML-package to compensate for this, but this will mean doing the whole processing with UTL_FILE instead of SQL. It would also mean we need to incorporate metadata about the file structure into the DML-package, other than just the structure of the data. This makes it a more complex and less generic approach, which is undesirable. Thus, we tend to conclude that it is still best to avoid the use of header rows.
The altered scripts are available in an updated version of the zipfile.
Update April 22, 2005
After we published these conclusions, Anton Scheffer noted in an additional personal e-mail that doing the whole processing with UTL_FILE is in fact quite simple. It allows you to do a multiple delete, and saves all bad records (headers, bad data etc.). Anton’s code and some testdata are available in an additional zipfile(by kind permission).
Again, we have investigated his work, and have noted the following. We started out on two major assumptions: "Do as much as you can with SQL, and as little as possible with UTL_FILE." and "Influence the original file as little as possible." The former explains our advice not to use header rows. The latter basically causes our restriction of not being able to do a multiple delete. Anton however has taken a different approach. He has added a piece of metadata about the file structure to the external table’s definition. More specifically: the "load when empno != blanks" clause tells Oracle to ignore any blank lines in the file. This allows him to implement a delete as an update to a blank line, thus preserving the match between pre-statement and actual line numbers. Hence, a mutiple delete can be processed correctly. In addition, he no longer uses a cursor to loop through the data records. Instead, he loops directly through the file, which allows him to preserve any record with bad data. The result of a delete statement is now a data file with one or more blank lines in it. In our example, MS-Excel is still able to open this file. However, one can easily imagine an application that is not able to handle blank lines in its inputfile, or that interprets a blank line as an END-OF-FILE character. To sum up: In our approach you stay close to the original file layout, with the limitation of not being able to do a multiple delete. In Anton’s approach, you can do all the DML you want, at the expense of altering the file structure. It is up to you to decide which approach is the preferable one in your situation.