Removing double records (and keeping one of them)

0

Assume you have a table with contaminated data (in this example: non-unique values on col1 and col2), for instance because no unique key was implemented.

col1 col2 value
2    9    Smith <br /><br />2    4    Jones<br /><br />2    9    Andersen<br /><br />8    6    Nicholson<br /><br />8    6    Parker

If you want to prevent this from happening in the future, you can of course create a unique key or check constraint:

SQL&gt; ALTER table TAB1 add (constraint  tab1_uk UNIQUE ( col1, col2 ) deferrable enable novalidate);<br /><br />Table altered. <br /><br />SQL&gt; INSERT INTO tab1 VALUES ( 2, 4, 'Ray' );<br />ERROR at line 1:ORA-00001: unique constraint (TAB1.TAB1_UK) violated

This in fact works fine, every new record wil be validated and existing records which fail the rule will be ignored during creation of the constraint. However, on update the constraint will be enforced.

But what if you want to clean up the doubles? The statement which I find the most easy and reusable is this one:

DELETE from TAB1 where rowid NOT IN ( select min(rowid) from TAB1 group by col1, col2 );

As you noticed, I use the rowid because this is the only column which distinguishes two records with the same key. It’ll always keep one unique record, even if there was only one record to begin with. It also takes care of the situation where you have more than 2 doubles, as opposed to this solution:

DELETE from TAB1 where id IN (select (max(id) from TAB1 group by col1, col2 HAVING COUNT(‘x’) > 1 ) –> too much work!

 

Share.

About Author

Comments are closed.