Using Table Functions Database 3

Using Table Functions

Overview of table functions

Table functions are functions that produce a collection or rows (either a nested table or a varray) that can be queried like a physical database table. You use a table function like the name of a database table, in the FROM clause of a query. The difference with a physical database table in the FROM clause of the query is that you need to use the TABLE() operator to tell the SQL engine that it should treat whatever comes out of the function as if it were a normal table.

Table function concepts

There a couple of steps to take when you are working with table functions. Like when you are working with normal tables you have to describe the way the records of the collection are set up.

Define Record Type

First of all you define a record type. This is pretty much the same records are defined in a database table, only difference is that you are now defining an object type instead of a table.record_type

CREATE TYPE script_line AS OBJECT
(line NUMBER   ,text VARCHAR2(32767))

This creates a type that can hold the contents of a single record. Compare this to a record in a physical table.

Define Nested Table type

After the creation of the record type you define a Nested Table type using the previously created Record Type. You can also create a Varray type if you are absolutely certain that you will not exceed the boundary of the vnestedtable_typearray.

CREATE or replace TYPE script_lines AS TABLE OF script_line;
Define function returning Nested Table type

After creating the types needed for our function we can define and implement the function in which we will do what we want it to do.

CREATE OR REPLACE FUNCTION createinsertscriptfor(
tablename_in IN VARCHAR2 ) RETURN script_lines

The complete implementation of the function can be found here.

This function can encapsulate our knowledge of the datadictionary and the way an insert statement should be constructed. All the things we tend to forget all the time, like the quotes that are needed when working with varchar2 fields. Or doubling the quotes when they are present in the data.

The function will return a nested table which is created and initialized in the declaration section of the function. For every record in the table there will be a row in the nested table.

Query function using the TABLE() operator

After having defined the types and the function we can query the function in SQL as if it were a physical table. Only thing we have to tell the SQL engine is to treat the result of the function as a fysical table, using the TABLE() operator, hence the name TABLE FUNCTION.

Nesting functions

Function can take the result of a query as input for (one of) their parameters. If we build multiple functions where one function can use the output of the other we can work with the results of the previous function without having to copy the entire collection from one function to another. When the functions are pipelined as well then there is a possibility to parallellize the execution of the functions.

Pipelined table functions

Functions can be pipelined. This means the results will become available when they are produced. As soon as a result is available instead of adding it to the nested table it will be piped out of the function.

CREATE OR REPLACE FUNCTION createinsertscriptfor(
tablename_in IN VARCHAR2 ) RETURN script_lines PIPELINED

The keyword PIPELINED is added to the function. Instead of extending the nested table and adding the result to it, it will be piped out of the function using PIPE ROW. When we are done with the code in the function control can be returned using the RETURN keyword. There are some strange things here. The function is said to return a NESTED TABLE but in fact in returns single records, one at a time. If you consider the output of the entire function then you can see that a table of records is being returned. Therefore the function should be defined as if it returns a NESTED TABLE. When the function is at its end there is nothing left to return but control. That is why we have a RETURN without any value.

If you query the function in SQL*Plus you will not see any difference in behavior. That is because SQL*Plus will retrieve the results and start displaying them when it has something like 15 results.

SQL> show arraysize

arraysize 15

When the arraysize is set to 1 then the results will be displayed one at a time as soon as they become available.

SQL> set arraysize 1

Parallel table functions

If your database permits it, that is when the database is setup to support it, then code can be executed in parallel. If the table is enabled for PARALLEL processing and the function is also parallel enabled then the function can be executed in parallel.

Normal execution is like this:

normalexecution

The data travels from function to function all being processed in memory. If the functions and the table were parallel enabled then execution would be something like this:

parallelexecution

In both approaches every row of data will be kept in memory until the function is done with it. Enabling parallel execution can speed up processing even on a laptop computer with a single processor but with multiple coress. If we add the pipelined option to the function then the stress on memory would be less, since the next function can start when the previous is done with a row. In theory there will not be more rows kept in memory than the ones being processed at the moment.

To enable a function for parallel execution all you have to do is add the PARALLEL_ENABLE clause to the functions signature:

{[ORDER | CLUSTER] BY column_list}
PARALLEL_ENABLE ({PARTITION p BY
[ANY | (HASH | RANGE)
column_list]} )

If you want to control how the data is parallelized then you should define one of the parameters of the function as a strongly typed ref cursor and use this parameter in the PARTITION p BY clause along with the type of partitioning (HASH or RANGE) and the columns you want the partitioning to be done by. If you are partitioning by ANY then you can also use a weak ref cursor. Partitioning is dividing the data being processed. If you don’t tell Oracle how the data should be partitioned, then it divides it evenly over all the parallel processes available. If you provide the HASH or RANGE parameter you have more control over what rows will be processed by which parallel process.

partitioning

Build a table function to export data

Using the knowledge we have about the datadictionary and our knowledge about table functions makes it possible to write code that creates a script to copy data from one scheme to another, for instance to copy settings from the development environment to the testing environment. Of course this script is something we could write during development, but experience teaches that we will always forget at least one row of data. If the table has a limited number of columns it’s really easy to create a SQL statement that would give us the script we want.

Let’s use the allknown DEPT table as an example. Suppose we want to create a script like this:

 

insert into DEPT( LOC, DNAME, DEPTNO) values ( 'NEW YORK', 'ACCOUNTING', 10);
insert into DEPT( LOC, DNAME, DEPTNO) values ( 'DALLAS', 'RESEARCH', 20);
insert into DEPT( LOC, DNAME, DEPTNO) values ( 'CHICAGO', 'SALES', 30);
insert into DEPT( LOC, DNAME, DEPTNO) values ( 'BOSTON', 'OPERATIONS', 40);

 

It is rather simple to create a SQL script that creates this script for us that will deliver us with exactly this script.

SELECT 'insert into DEPT( LOC, DNAME, DEPTNO) values ( '''||
LOC||''', '''||
DNAME||''', '||
DEPTNO||');' line
FROM DEPT;

This is really easy for simple tables with a few columns although you still have to remember to double the quotes in your query and what happens when a varchar2 field has a quote in it.

It would be so much easier to hide all this complexity behind a PL/SQL interface. And with table functtions we can.

I have created a function that does just this. It is by far not complete and it can probably be improved a lot but it’s a nice thing to play with tablefunctions. You can download the code from this location[download link] but I will highlight some pieces of the code

As described earlier I create a record type to hold a line for the insert script and a nested table to hold all the lines. The signature of the function is also given earlier.

The code first gets all the columns for the table from the datadictionary. With this information the first part of the statement is built. Then all the rows in the table are being processed. Depending on their datatype they will be handled differently. A varchar column for instance will have some extra processing to make sure quotes are put in the script correctly. When a row is processed, an SQL statement is ready and can be added to the nested table. When every row is processed the entire table is returned.

Available toolsUsing Table Functions data compare for oracle

Ofcourse there are many tools available that can do pretty much the same thing. But what if you want to create a script which you can include in your deploy script so the DBA can run the install from SQL*Plus. There is for instance Data Compare for Oracle by Red Gate. To create a script that can be included in your deployment script it is easiest to create a dummy schema with the same tables you want to create a script for, but with no data in them. Then let the tool do it’s magic and check the script which can be saved and included in the overall script.

The code generated by this tool is like this:

DECLARE
null_value CHAR(1) := NULL;
statement1 CHAR(46);
BEGIN
statement1 := 'INSERT INTO "EMPTY"."DEPT" VALUES (:0, :1, :2)';
EXECUTE IMMEDIATE statement1 USING 10, 'ACCOUNTING', 'NEW YORK';
EXECUTE IMMEDIATE statement1 USING 20, 'RESEARCH', 'DALLAS';
EXECUTE IMMEDIATE statement1 USING 30, 'SALES', 'CHICAGO';
EXECUTE IMMEDIATE statement1 USING 40, 'OPERATIONS', 'BOSTON';
END;
/
COMMIT;

This is a completely different approach to the scipt using tablefunctions, but in the end the result is the same. Creating the script ourselves using a PL/SQL function is in my opinion more fun, but creating the script using the tool is probably faster. You can download a fully functional trial of Data Compare for Oracle from the Red Gate website so you may as well give it a go and a generate some scripts for free this way to see for yourself.