Removing double records (and keeping one of them)

Patrick Sinke

Assume you have a table with contaminated data (in this example: non-unique values on col1 and col2), for instance because no unique key was implemented.

col1 col2 value
2    9    Smith 

2 4 Jones

2 9 Andersen

8 6 Nicholson

8 6 Parker

If you want to prevent this from happening in the future, you can of course create a unique key or check constraint:

SQL> ALTER table TAB1 add (constraint  tab1_uk UNIQUE ( col1, col2 ) deferrable enable novalidate);

Table altered.

SQL> INSERT INTO tab1 VALUES ( 2, 4, 'Ray' );
ERROR at line 1:ORA-00001: unique constraint (TAB1.TAB1_UK) violated

This in fact works fine, every new record wil be validated and existing records which fail the rule will be ignored during creation of the constraint. However, on update the constraint will be enforced.

But what if you want to clean up the doubles? The statement which I find the most easy and reusable is this one:

DELETE from TAB1 where rowid NOT IN ( select min(rowid) from TAB1 group by col1, col2 );

As you noticed, I use the rowid because this is the only column which distinguishes two records with the same key. It’ll always keep one unique record, even if there was only one record to begin with. It also takes care of the situation where you have more than 2 doubles, as opposed to this solution:

DELETE from TAB1 where id IN (select (max(id) from TAB1 group by col1, col2 HAVING COUNT(‘x’) > 1 ) –> too much work!


Next Post

Testing, One, Two....

Just in time for the Oracle PL/SQL Programming Conference, OPP2007, the Code Tester for Oracle is released. I am very excited about this tool and more posts will follow. Congratulations Steven! Related posts: Rule based in Oracle 10g Best practices suck! Being Mr. Ellison… Uitnodiging: Kom kennismaken met AMIS Hotsos […]