MERGE! 20188367001

MERGE!

When I was confronted with the challenge to import about 260 tables from two different Rdb’s into a single Oracle database (9.0.1.x) on a daily basis via a database link my first thought was: Oh no, I am not going to write 260 times two individual DML statements, one UPDATE and one INSERT. Or one of the alternatives: UPDATE a row and if SQL%NOTFOUND then INSERT, or try to INSERT a row and if it fails with a DUP_VAL_ON_INDEX exception, then do the UPDATE.

So, I re-read the features of 9i and I re-discovered the MERGE statement. In various Oracle articles is mentioned that MERGE is a useful statement in a data warehouse environment, where tables need to be periodically refreshed with the new data arriving from on-line systems. Well, same here. There are two Rdb environments which come together in a single Oracle 9i environment, in which the data has to be kept at least a couple of years. So, I wrote procedures with the MERGE statement, a parameter is included to switch between the two Rdb’s.

The MERGE statement is of course not something new. It was introduced in Oracle 9i and it allows you to select rows from one table for update or insertion into another table. The decision whether to update or insert into the target table is based on a condition in the ON clause. This statement is a convenient way to combine at least two operations. It lets you avoid multiple INSERT and UPDATE DML statements.

And now in 10g, the MERGE statement has been extended to cover a larger variety of complex and conditional data transformations. This will allow faster loading of large volumes of data (1). And in 10g it is possible to DELETE rows from the target table (2).

MERGE [ hint ]
INTO [ schema. ]table [ t_alias ]
USING [ schema. ] { table | view | subquery }
[ t_alias ]
ON ( condition )
WHEN MATCHED THEN
UPDATE SET column = { expr | DEFAULT }
[, column = { expr | DEFAULT } ]…
[ where_clause ]
[ DELETE where_clause ]
WHEN NOT MATCHED THEN
INSERT [ (column [, column ]…) ]
VALUES ({ expr [, expr ]… | DEFAULT })
[ where_clause ]

In 10g it is possible to add a WHERE clause to the UPDATE statement and the INSERT statement. The condition can refer to either the data source or the target table. And it is possible to delete records in the update statement. The only rows affected by this clause are those rows in the destination table that are updated by the merge operation. That is, the DELETE WHERE condition evaluates the updated value, not the original value that was evaluated by the UPDATE SET … WHERE condition. If a row of the destination table meets the DELETE condition but is not included in the join defined by the ON clause, then it is not deleted (2).

But when do you use this MERGE statement?

In the PL/SQL User’s Guide and Reference 10g Release 1 (10.1) is stated that this statement is primarily useful in data warehousing situations where large amounts of data are commonly inserted and updated. If you only need to insert or update a single row, it is more efficient to do that with the regular PL/SQL techniques: try to update the row, and do an insert instead if the update affects zero rows; or try to insert the row, and do an update instead if the insert raises an exception because the table already contains that primary key (4).

And Tom Kyte says about MERGE: Truncate and reload would most likely be faster than merge if most of the table is going to be modified. The reload could be done in direct path, without indexes — rebuild indexes in parallel with no logging afterwards. Merge does an outer join between the two sets basically and updates matches as it hits them and inserts misses. It is not like a “procedure”, it is like insert, update or delete (5).

According to an Oracle Technical White Paper (3) the performance improved on average with 30% with a Serial Execution and 22% with a Parallel Execution.
(Serial Execution: table SALES_FACT contained 5 million rows and table SALES_NOV99 contained 1 million rows. Parallel Execution: table SALES_FACT contained 50 million rows (10 partitions), and table SALES_NOV99 contained 5 million rows).

Besides performance it is also good to know if there are some unknown features in this statement. After creating the procedures and testing them I came across a bug in 9i. If you are using a sequence in the target table, say as a unique ID, this sequence will be incremented for each row in the source table each time the MERGE statement is executed, even if there are no insert statements. So if you call your procedure every day and the source table contains 100 records, the sequence will be incremented by 100, even if there are no new records inserted. This bug is solved in 10g.

So, if you expect to update or insert a single row, just use UPDATE – SQL%NOTFOUND – INSERT. If the entire table is going to be modified; truncate and reload the table. In my situation I think it’s better to use MERGE. Every day there will be new records inserted, some updated. At this moment there are no exact figures available on the annually growth of these tables, but one thing is for sure: they will increase and multiple insertions and updates are going to take place on a daily basis. It’s too bad that I had to create triggers on the different tables in Oracle in order to use a sequence, but besides this minor problem I would say: use MERGE!

1. Oracle® Database New Features Guide 10g Release 1 (10.1) Part No. B10750-01 December 2003, p. 1-50
2. http://download-west.oracle.com/docs/cd/B14117_01/server.101/b10759/statements_9016.htm#SQLRF01606
3. Performance and Scalability in DSS Environment with Oracle9i, page 8-11
4. http://download-west.oracle.com/docs/cd/B14117_01/appdev.101/b10807/13_elems029.htm#sthref1735
5. http://asktom.oracle.com/pls/ask/f?p=4950:8:11788848576231155945::NO::F4950_P8_DISPLAYID,F4950_P8_CRITERIA:5318183934935

5 Comments

  1. Lucas Jellema December 10, 2005
  2. Arindam Mukherjee December 10, 2005
  3. Hank September 15, 2005
  4. Mike Jacobsen July 13, 2005
  5. Derek Lee June 3, 2005