Summary: in this tutorial, you will learn step by step how to delete duplicate records in Oracle Database using the `DELETE` statement with a subquery.

Once you’ve found the duplicate records in a table, you often want to delete the unwanted copies to keep your data clean.

If a table has a few duplicate rows, you could do this manually one by one by using a simple DELETE statement. However, it is time-consuming to do it manually if the table has a large number of duplicate records. It is better to construct a single statement that removes all unwanted copies in one go.

Before removing the duplicate records, you must decide which instances you want to keep. For example, you may want to preserve the newest or oldest row. In this case, you need a column in the table like id column that is not the part of the group used to evaluate duplicate.

Consider the following fruits table:

how to delete duplicate records in oracle - sample table

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

-- create fruit table

CREATE TABLE fruits (

        fruit_id   NUMBER generated BY DEFAULT AS IDENTITY,

        fruit_name VARCHAR2(100),

        color VARCHAR2(20),

        PRIMARY KEY (fruit_id)

);

-- insert sample rows

INSERT INTO fruits(fruit_name,color) VALUES('Apple','Red');

INSERT INTO fruits(fruit_name,color) VALUES('Apple','Red');

INSERT INTO fruits(fruit_name,color) VALUES('Orange','Orange');

INSERT INTO fruits(fruit_name,color) VALUES('Orange','Orange');

INSERT INTO fruits(fruit_name,color) VALUES('Orange','Orange');

INSERT INTO fruits(fruit_name,color) VALUES('Banana','Yellow');

INSERT INTO fruits(fruit_name,color) VALUES('Banana','Green');

-- query data from the fruit table

SELECT * FROM fruits;

Suppose you want to keep the row with the highest fruit_id and delete all other copies. The following query returns the last entry for each combination of fruit_name and color:

1

2

3

4

5

6

7

8

9

SELECT

  MAX(fruit_id)

FROM

  fruits

GROUP BY

  fruit_name,

  color

ORDER BY

  MAX(fruit_id);

how to delete duplicate records in oracle - max id

You use the following DELETE statement with a subquery to delete rows whose values in the fruit_id column are not the highest ones.

1

2

3

4

5

6

7

8

9

10

11

12

13

DELETE FROM

  fruits

WHERE

  fruit_id NOT IN

  (

    SELECT

      MAX(fruit_id)

    FROM

      fruits

    GROUP BY

      fruit_name,

      color

  );

Three rows were deleted which is what we expected:

1

2

3

4

SELECT

  *

FROM

  fruits;

how to delete duplicate records in oracle - after removing duplicates

Likewise, if you want to keep the row with the lowest id, you use the MIN() function instead of the MAX() function:

1

2

3

4

5

6

7

8

9

10

11

12

13

DELETE FROM

  fruits

WHERE

  fruit_id NOT IN

  (

    SELECT

      MIN(fruit_id)

    FROM

      fruits

    GROUP BY

      fruit_name,

      color

  );

This method above works if you have a column that is not part of the group for evaluating duplicate. If all values in all columns can have copies, then you cannot use the fruit_id column anymore.

Let’s drop and create the fruits table with a new structure and data as follows:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

DROP TABLE fruits;

CREATE TABLE fruits (

        fruit_id   NUMBER,

        fruit_name VARCHAR2(100),

        color VARCHAR2(20)

);

INSERT INTO fruits(fruit_id,fruit_name,color) VALUES(1,'Apple','Red');

INSERT INTO fruits(fruit_id,fruit_name,color) VALUES(1,'Apple','Red');

INSERT INTO fruits(fruit_id,fruit_name,color) VALUES(2,'Orange','Orange');

INSERT INTO fruits(fruit_id,fruit_name,color) VALUES(2,'Orange','Orange');

INSERT INTO fruits(fruit_id,fruit_name,color) VALUES(2,'Orange','Orange');

INSERT INTO fruits(fruit_id,fruit_name,color) VALUES(3,'Banana','Yellow');

INSERT INTO fruits(fruit_id,fruit_name,color) VALUES('4,Banana','Green');

SELECT * FROM fruits;

how to delete duplicate records in oracle - sample table with duplicates in all columns

In the fruits table, the values in all columns fruit_id, fruit_name, and color have copies.

In this case, you can use the rowid which is a physical locator that specifies where on storage Oracle stores the row. Because the rowid is unique to each row, you can use it to remove the duplicates as shown below:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

DELETE

FROM

  fruits

WHERE

  rowid NOT IN

  (

    SELECT

      MIN(rowid)

    FROM

      fruits

    GROUP BY

      fruit_id,

      fruit_name,

      color

  );

The following query verifies the delete operation:

1

2

3

4

SELECT

  *

FROM

  fruits;

how to delete duplicate records in oracle - after deleting copies

Second Approach:

5 ways to delete duplicate records Oracle

JANUARY 29, 2013 ~ SQLANDPLSQL

In Oracle there are many ways to delete duplicate records. Note that below example are described to just explain the different possibilities.

Consider the EMP table with below rows

create table emp(
EMPNNO integer,
EMPNAME varchar2(20),
SALARY number);

10    Bill    2000
11    Bill    2000
12    Mark    3000
12    Mark    3000
12    Mark    3000
13    Tom    4000
14    Tom    5000
15    Susan    5000

1. Using rowid

SQL > delete from emp
where rowid not in
(select max(rowid) from emp group by empno);

This technique can be applied to almost scenarios. Group by operation should be on the columns which identify the duplicates.

2. Using self-join

SQL > delete from emp e1
where rowid not in
(select max(rowid) from emp e2
where e1.empno = e2.empno );

3. Using row_number()

SQL > delete from emp where rowid in
(
select rid from
(
select rowid rid,
row_number() over(partition by empno order by empno) rn
from emp
)
where rn > 1
);

This is another efficient way to delete duplicates

4. Using dense_rank()

SQL > delete from emp where rowid in
(
select rid from
(
select rowid rid,
dense_rank() over(partition by empno order by rowid) rn
from emp
)
where rn > 1
);

Here you can use both rank() and dens_rank() since both will give unique records when order by rowid.

5. Using group by

Consider the EMP table with below rows

10    Bill    2000
11    Bill    2000
12    Mark    3000
13    Mark    3000

SQL > delete from emp where
(empno,empname,salary) in
(
select max(empno),empname,salary from emp
group by empname,salary
);

This technique is only applicable in few scenarios.

Always take extra caution while deleting records.

1. First identify the duplicates using select.

2. Double verify those are actual ‘duplicates’ or not

3. Take backup if necessary

4. Apply commit only if you are sure.

Did you find above post useful ? Your comments are highly valuable.

Oracle ADF

Saturday, October 12, 2019

How to Delete Duplicate Records in Oracle

Summary: in this tutorial, you will learn step by step how to delete duplicate records in Oracle Database using the `DELETE` statement with a subquery.

5 ways to delete duplicate records Oracle

No comments:

Post a Comment