One of the interesting new pieces of functionality in Oracle 11g is offered by the DBMS_COMPARISON package. This package allows us to compare two tables – or two views on tables – and inform us on the differences in data. More specifically: the package can summarize if there any changes between the data sets and if so how many. It can also inform us in detail about the specific rows that are different between the two. Finally, DBMS_COMPARISON can converge the data sets, taking one of the two as the master and fully aligning the other one. Using DBMS_COMPARISON, we get on-demand replication for ordinary tables, within schemas, across schemas or even across databases.
In this article, I will demonstrate – on the 11.1.0.6 (Beta) release for Windows – how I can compare table EMP with a clone, whose data has been modified. We will then see how we can re-align the clone with the original (master) table. In this example, I work within a single schema. However, everything I do can be done across schemas and even across a database link, catering for custom data replication scenarios.
Here is our challenge:
The following steps have been taken, to create a clone EMP_COPY of the EMP table in the SCOTT schema and make some changes in the data of EMP_COPY:
create table emp_copy as select * from emp / alter table emp_copy add constraint emp_copy_pk primary key (empno) / update emp_copy set comm = comm * 1.21 / update emp_copy set ename= initcap(ename) where job ='CLERK' / delete emp_copy where job = 'ANALYST' / insert into emp_copy (empno, ename, deptno, job) values (1313, 'JELLEMA', 30, 'TRAINER') / select empno , ename , comm from emp_copy / EMPNO ENAME COMM ---------- ---------- ---------- 7369 Smith 7499 ALLEN 363 7521 WARD 605 7566 JONES 7654 MARTIN 1694 7698 BLAKE 7782 CLARK 7839 KING 7844 TURNER 0 7876 Adams 7900 James EMPNO ENAME COMM ---------- ---------- ---------- 7934 Miller 1313 JELLEMA 13 rows selected.
Now let’s pretend EMP_COPY is a remote table that really ought to be synchronized with table EMP. Of course we could try to write a MERGE statement possibly updating far too many rows that have no differences – , combined with a separate DELETE operation for removing rows in EMP_COPY that do not exist in EMP. Or we can use DBMS_COMPARISON.
First our user needs execute privileges on DBMS_COMPARISON.
Then we create a comparison for these two objects:
BEGIN DBMS_COMPARISON.CREATE_COMPARISON ( comparison_name => 'compare_emp_and_clone' , schema_name => 'scott' , object_name => 'emp' , dblink_name => null , remote_schema_name=>'scott' , remote_object_name=>'EMP_COPY' ); END;
Note that since the two tables are in the same database, we specify the dblink_name parameter as null. The CREATE_COMPARISON procedure has more parameters, such as INDEX_NAME to indicate the index that is used for matching records in the two data sets and COLUMN_LIST to specify which columns are involved in comparing and converging.
Having created a comparison, we can perform a compare operation:
DECLARE consistent BOOLEAN; scan_info DBMS_COMPARISON.COMPARISON_TYPE; BEGIN consistent := DBMS_COMPARISON.COMPARE ( comparison_name => 'compare_emp_and_clone' , scan_info => scan_info , perform_row_dif => TRUE ); DBMS_OUTPUT.PUT_LINE('Scan ID: '||scan_info.scan_id); IF consistent=TRUE THEN DBMS_OUTPUT.PUT_LINE('No differences were found.'); ELSE DBMS_OUTPUT.PUT_LINE('Differences were found.'); END IF; END; Scan ID: 79 Differences were found. PL/SQL procedure successfully completed.
The result of this operation can be retrieved from a number of Data Dictionary Views: USER_COMPARISON, USER_COMPARISON_SCAN_SUMMARY, USER_COMPARISON_COLUMNS and USER_COMPARISON_ROW_DIF.
We can find out whether the two objects involved in this COMPARISON – or rather their data sets – are equal or have been found to have differences and if so, how many.
SELECT s.scan_id , c.COMPARISON_NAME , c.SCHEMA_NAME , c.OBJECT_NAME , s.CURRENT_DIF_COUNT FROM USER_COMPARISON c , USER_COMPARISON_SCAN_SUMMARY s WHERE c.COMPARISON_NAME = s.COMPARISON_NAME AND s.scan_id = 79 SCAN_ID Comparison Name Schema Name Object Name Differences ---------- -------------------- ----------- ----------- ----------- 79 COMPARE_EMP_AND_CLON SCOTT EMP 10 E
Drilling down, we can find out the rowid or index values of the rows that either exist only in one of the two data sets or have differences in their data values.
SELECT c.COLUMN_NAME , r.INDEX_VALUE , case when r.LOCAL_ROWID is null then 'No' else 'Yes' end LOCAL_ROWID , case when r.REMOTE_ROWID is null then 'No' else 'Yes' end REMOTE_ROWID FROM USER_COMPARISON_COLUMNS c , USER_COMPARISON_ROW_DIF r , USER_COMPARISON_SCAN s WHERE c.COMPARISON_NAME = 'COMPARE_EMP_AND_CLONE' AND r.SCAN_ID = s.SCAN_ID AND s.last_update_time > systimestamp - 1/24/15 AND r.STATUS = 'DIF' AND c.INDEX_COLUMN = 'Y' AND c.COMPARISON_NAME = r.COMPARISON_NAME ORDER BY r.INDEX_VALUE
Index Column Index Value Local Row Exists? Remote Row Exists?
————— ————— ——————– ——————–
EMPNO 1313 No Yes
EMPNO 7369 Yes Yes
EMPNO 7499 Yes Yes
EMPNO 7521 Yes Yes
EMPNO 7654 Yes Yes
EMPNO 7788 Yes No
EMPNO 7876 Yes Yes
EMPNO 7900 Yes Yes
EMPNO 7902 Yes No
EMPNO 7934 Yes Yes
10 rows selected.
Knowing all this, we can decide to go for the synchronization. Using DBMS_COMPARISON.CONVERGE, we can synchronize either the local table with the remote data set or vice versa. In this example, we will roll EMP into EMP_COPY, making sure that EMP_COPY is back in line with its master:
DECLARE scan_info DBMS_COMPARISON.COMPARISON_TYPE; BEGIN DBMS_COMPARISON.CONVERGE ( comparison_name => 'compare_emp_and_clone' , scan_id => 79 , scan_info => scan_info , converge_options => DBMS_COMPARISON.CMP_CONVERGE_LOCAL_WINS ); DBMS_OUTPUT.PUT_LINE('Local Rows Merged: '||scan_info.loc_rows_merged); DBMS_OUTPUT.PUT_LINE('Remote Rows Merged: '||scan_info.rmt_rows_merged); DBMS_OUTPUT.PUT_LINE('Local Rows Deleted: '||scan_info.loc_rows_deleted); DBMS_OUTPUT.PUT_LINE('Remote Rows Deleted: '||scan_info.rmt_rows_deleted); END; Local Rows Merged: 0 Remote Rows Merged: 9 Local Rows Deleted: 0 Remote Rows Deleted: 1 PL/SQL procedure successfully completed.
This simple statement – combined with the earlier creation of the COMPARISON itself – is all that is needed to synchronize the two tables! When we verify the situation in EMP_COPY, we find:
SQL> select empno 2 , ename 3 , comm 4 from emp_copy 5 / EMPNO ENAME COMM ---------- ---------- ---------- 7369 SMITH 7499 ALLEN 300 7521 WARD 500 7566 JONES 7654 MARTIN 1400 7698 BLAKE 7782 CLARK 7839 KING 7844 TURNER 0 7876 ADAMS 7900 JAMES EMPNO ENAME COMM ---------- ---------- ---------- 7934 MILLER 7788 SCOTT 7902 FORD 14 rows selected.
And indeed EMP_COPY is once again in line with EMP.
A second scenario: The most convoluted UPDATE statement in the world
In this second scenario, I wil show the probably most unlikely way to update the salaries of the managers in table EMP. However, it serves as a demonstration of two important facets of the COMPARISONS:
- Comparisons not only can be applied to Tables, they can also be used with (single table) Views and Materialized Views
- Comparisons do not have to apply to all columns in the tables or views they refer to – both for comparison and update, it is possible to specify a subset to process
Our challenge is to raise the salaries of the three MANAGERs in EMP by 15%. But we tread carefully. We first take a long and close look at the effect before we ask DBMS_COMPARISON to perform the update.
In the view CHANGED_MANAGERS, we see the result of our update:
create view changed_managers as select empno , ename , sal * 1.15 as sal from emp where job = 'MANAGER' / select * from changed_managers /
The original situation is visible in the view MANAGERS:
Now we create a COMPARISON between CHANGED_MANAGERS and MANAGERS:
< >BEGIN
DBMS_COMPARISON.CREATE_COMPARISON
( comparison_name => 'update_managers'
, schema_name => 'scott'
, object_name => 'MANAGERS'
, dblink_name => null
, column_list => 'EMPNO,SAL'
, remote_schema_name=>'scott'
, remote_object_name=>'CHANGED_MANAGERS'
);
END;
>
Two things to notice: we create the COMPARISON for two views rather than tables. Note that we could have used a Table and View. Also notice how we specified a column list: only differences in Salary are of interest right now (EMPNO is included since that is the INDEX_COLUMN through which the managers are identified.
Using this COMPARISON, we can do a compare:
DECLARE consistent BOOLEAN; scan_info DBMS_COMPARISON.COMPARISON_TYPE; BEGIN consistent := DBMS_COMPARISON.COMPARE( comparison_name => 'update_managers', scan_info => scan_info, perform_row_dif => TRUE); DBMS_OUTPUT.PUT_LINE('Scan ID: '||scan_info.scan_id); IF consistent=TRUE THEN DBMS_OUTPUT.PUT_LINE('No differences were found.'); ELSE DBMS_OUTPUT.PUT_LINE('Differences were found.'); END IF; END;
which does not tell us anything surprising. We can then perform a converge from CHANGED_MANAGERS to MANAGERS to apply the changes. Indirectly, this is the same as performing the update emp set sal = sal * 1.15 where job=’MANAGER’ – very indirectly!
prompt Merge the Remote (Changed_Managers) into the Local (Managers) data set (updating the EMP table in the process) DECLARE scan_info DBMS_COMPARISON.COMPARISON_TYPE; BEGIN DBMS_COMPARISON.CONVERGE( comparison_name => 'update_managers', scan_id => &scan_id, scan_info => scan_info, converge_options => DBMS_COMPARISON.CMP_CONVERGE_REMOTE_WINS); DBMS_OUTPUT.PUT_LINE('Local Rows Merged: '||scan_info.loc_rows_merged); DBMS_OUTPUT.PUT_LINE('Remote Rows Merged: '||scan_info.rmt_rows_merged); DBMS_OUTPUT.PUT_LINE('Local Rows Deleted: '||scan_info.loc_rows_deleted); DBMS_OUTPUT.PUT_LINE('Remote Rows Deleted: '||scan_info.rmt_rows_deleted); END; /
The result of the converge is – not surprisingly -:
SQL> select * 2 from managers 3 / EMPNO ENAME SAL ---------- ---------- ---------- 7566 JONES 3421.25 7698 BLAKE 3277.5 7782 CLARK 2817.5 3 rows selected.
With the COMPARISON we already had created, it is very easy to return the salaries back to the original situation. We recreate CHANGED_MANAGERS to:
create or replace view changed_managers as select empno , ename , sal/1.15 as sal from emp where job = 'MANAGER' /
and re-execute the CONVERGE operation
< >DECLARE
consistent BOOLEAN;
scan_info DBMS_COMPARISON.COMPARISON_TYPE;
BEGIN
consistent := DBMS_COMPARISON.COMPARE
( comparison_name => 'update_managers'
, scan_info => scan_info
, perform_row_dif => TRUE
);
DBMS_COMPARISON.CONVERGE
( comparison_name => 'update_managers'
, scan_id => scan_info.scan_id
, scan_info => scan_info
, converge_options => DBMS_COMPARISON.CMP_CONVERGE_REMOTE_WINS -- CHANGED_MANAGERS rolls into MANAGERS
);
END;
>
With the effect that the salaries of the Managers are back to where they were.
Resources
Oracle® Database 2 Day + Data Replication and Integration Guide – Chapter 10 – Comparing and Converging Data (http://download.oracle.com/docs/cd/B28359_01/server.111/b28324/tdpii_diverge.htm )
Oracle Database PL/SQL Packages and Types Reference – DBMS_COMPARISON (http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28419/d_comparison.htm#ARPLS868 )
PSOUG – Quick Reference DBMS_COMPARISON (http://psoug.org/reference/dbms_comparison.html )
Hi,
The procedure DBMS_COMPARISON.CREATE_COMPARISON creates a comparison. What if I want to delete the created comparison and its data in the dictionary views?
The challenge we have is not comparing table data (which is offered in other products) but to compare ALL the metadata associated with the two tables. This is the challenge and has been the challenge with profiling and cleansing. I have recently become a convert of an open-standard called JUMP Metadata (sourceforge project JUMPER has script plug-ins for some tools) and cannot wait for vendors to finally see the light and begin to adopt this method.
You did describe how to compare tables by setting the scan info. But i have a query how do you go for comparing two adjacent rows of same table. I was thinking to use varrays aong with data dictionary but the coulmn type wouldn’t be same for each column in the varray.
So how do i use the DBMS.compasrison package.