Finding and getting rid of duplicate records in a table is easy….
The row_number() over ( <query_partition_clause> <order_by_clause> ) analytic function is helpful in determining not only the number of duplicate records, but also in identifying them. Setting up the partition clause though with each and every attribute that is in your partition query, can become a bit cumbersome when dealing with tables with a lot of columns.
So I wanted to have an option to generate not only all necessary column names for this clause, but also to exclude one or more columns from the list of column names, based on my input.
For this purpose, I created 2 overloaded dedup.get_partition_cols functions.
How to use (one of) them, is shown in the following example code.
The package code is provided at the end of this article.
— Create test table t with 600 records, a composite primary key, and duplicate values in the last column
SQL~ create table t 2 as 3 select name 4 , seqnr 5 , name||' record ' note 6 from (select case when rownum between 1 and 100 then 'harry' 7 when rownum between 101 and 200 then 'ann' 8 when rownum between 201 and 300 then 'bob' 9 when rownum between 301 and 400 then 'ronald' 10 when rownum between 401 and 500 then 'patrick' 11 when rownum between 501 and 600 then 'marie' end name 12 , case when rownum between 1 and 100 then rownum-1 13 when rownum between 101 and 200 then rownum-101 14 when rownum between 201 and 300 then rownum-201 15 when rownum between 301 and 400 then rownum-301 16 when rownum between 401 and 500 then rownum-401 17 when rownum between 501 and 600 then rownum-501 end seqnr 18 from dual 19 connect by rownum != 601); Table created. SQL~ alter table t add (constraint t_pk primary key (name, seqnr)); Table altered. SQL~ select count(*) from t; COUNT(*) ---------- 600 1 row selected.
— Check if t contains duplicate records, and use a partition clause with all columns
— At generation, the column list in i, in combination with the “new_value” directive of SQLPLUS, is
— transferred to a substitution variable [1], which can then be used in various “duplicate rows” statements.
— I purposely don’t use a noprint option for i, because I like the value of i to be shown here.
SQL~ set verify off heading off feedback off pagesize 0 define + SQL~ col i new_value 1 SQL~ select dedup.get_partition_cols ('t',0) i from dual; NAME,SEQNR,NOTE SQL~ prompt ++1 NAME,SEQNR,NOTE SQL~ select count(*) 2 from ( select row_number() over ( partition by ++1 order by rownum ) rn 3 from t ) 4 where rn != 1 ; 0
— No surprise here, because the table contains a primary key and all columns are setup in the partition clause…
— But now with pk columns excluded!
SQL~ select dedup.get_partition_cols ('t',1) i from dual; NOTE SQL~ prompt ++1 NOTE SQL~ select count(*) 2 from ( select row_number() over ( partition by ++1 order by rownum ) rn 3 from t ) 4 where rn != 1 ; 594
— Delete duplicates
SQL~ delete t 2 where rowid in ( select rid 3 from ( select rowid rid 4 , row_number() over ( partition by ++1 order by rownum ) rn 5 from t ) 6 where rn != 1 ); SQL~ select count(*) from t; 6
— Package and package body dedup
CREATE OR REPLACE PACKAGE Dedup IS /* Q: How to use the dedup functions? A: The first dedup.get_partition_cols function has 2 parameters, one for the table name, and one for -- excluding no columns (value=0), -- excluding primary key columns (value=1) -- excluding primary key columns and unique key columns (value=2). With the second, overloaded function, you can exclude specific columns (10 max). Q: How are the column names extracted? A: To list all table columns, 11gR2 listagg syntax is used: ============ SQL~ select listagg(column_name,',') within group(order by rownum) from user_tab_columns where table_name = 'T'; ============ This statement gets executed when you call dedup.get_partition_cols('t',0). All other calls result in an exclusion of one or more column names from the list of all columns. */ ------------------------------------------------------------------------------------------------- FUNCTION Get_Partition_Cols ( P_table_name varchar2 , P_exclude_col01 varchar2 := NULL , P_exclude_col02 varchar2 := NULL , P_exclude_col03 varchar2 := NULL , P_exclude_col04 varchar2 := NULL , P_exclude_col05 varchar2 := NULL , P_exclude_col06 varchar2 := NULL , P_exclude_col07 varchar2 := NULL , P_exclude_col08 varchar2 := NULL , P_exclude_col09 varchar2 := NULL , P_exclude_col10 varchar2 := NULL ) RETURN VARCHAR2; ------------------------------------------------------------------------------------------------- FUNCTION Get_Partition_Cols ( P_table_name varchar2 , P_exclude_cols pls_integer ) RETURN VARCHAR2; ------------------------------------------------------------------------------------------------- END Dedup; /
CREATE OR REPLACE PACKAGE BODY Dedup IS ------------------------------------------------------------------------------------------------- FUNCTION Get_Partition_Cols ( P_table_name varchar2 , P_exclude_col01 varchar2 := NULL , P_exclude_col02 varchar2 := NULL , P_exclude_col03 varchar2 := NULL , P_exclude_col04 varchar2 := NULL , P_exclude_col05 varchar2 := NULL , P_exclude_col06 varchar2 := NULL , P_exclude_col07 varchar2 := NULL , P_exclude_col08 varchar2 := NULL , P_exclude_col09 varchar2 := NULL , P_exclude_col10 varchar2 := NULL ) RETURN VARCHAR2 AS v_table_name varchar2(30) := upper(trim(P_table_name )); v_ret varchar2(2000); begin if ( P_exclude_col01 is null and P_exclude_col02 is null and P_exclude_col03 is null and P_exclude_col04 is null and P_exclude_col05 is null and P_exclude_col06 is null and P_exclude_col07 is null and P_exclude_col08 is null and P_exclude_col09 is null and P_exclude_col10 is null ) then select listagg(column_name,',') within group(order by rownum) into v_ret from user_tab_columns where table_name = v_table_name; else select listagg(column_name,',') within group(order by rownum) into v_ret from user_tab_columns uts where uts.table_name = v_table_name and column_name not in ( nvl(upper(trim(P_exclude_col01)),' ') , nvl(upper(trim(P_exclude_col02)),' ') , nvl(upper(trim(P_exclude_col03)),' ') , nvl(upper(trim(P_exclude_col04)),' ') , nvl(upper(trim(P_exclude_col05)),' ') , nvl(upper(trim(P_exclude_col06)),' ') , nvl(upper(trim(P_exclude_col07)),' ') , nvl(upper(trim(P_exclude_col08)),' ') , nvl(upper(trim(P_exclude_col09)),' ') , nvl(upper(trim(P_exclude_col10)),' ') ); end if; return v_ret; exception when others then raise; end Get_Partition_Cols; ------------------------------------------------------------------------------------------------- FUNCTION Get_Partition_Cols ( P_table_name varchar2 , P_exclude_cols pls_integer ) RETURN VARCHAR2 AS v_table_name varchar2(30) := upper(trim(P_table_name )); v_ret varchar2(2000); begin if P_exclude_cols = 0 then -- no exclusion select listagg(column_name,',') within group(order by rownum) into v_ret from user_tab_columns where table_name = v_table_name; elsif P_exclude_cols = 1 then -- exclude pk cols only select listagg(column_name,',') within group(order by rownum) into v_ret from user_tab_columns uts where uts.table_name = v_table_name and not exists ( select 1 from user_ind_columns ucs , user_indexes uis where ucs.index_name = uis.index_name and ucs.table_name = uis.table_name and uis.table_name = v_table_name and uis.uniqueness = 'UNIQUE' and ucs.column_name = uts.column_name and uts.nullable = 'N' ); elsif P_exclude_cols = 2 then -- exclude pk and uk cols select listagg(column_name,',') within group(order by rownum) into v_ret from user_tab_columns uts where uts.table_name = v_table_name and not exists ( select 1 from user_ind_columns ucs , user_indexes uis where ucs.index_name = uis.index_name and ucs.table_name = uis.table_name and uis.table_name = v_table_name and uis.uniqueness = 'UNIQUE' and ucs.column_name = uts.column_name ); end if; return v_ret; exception when others then raise; end Get_Partition_Cols; ------------------------------------------------------------------------------------------------- END Dedup; /