duplicate records

JeanneZ · Jun 4, 2003

Hi, experts,
I have a question to consult you. I have a table which contains some duplicate records on some columns. Some of them have two duplicate records, and some of tme have three, and so on. I can find them but I want to keep only one and delete others.
For example,
select id, dept, name, location
from tab
group by id, dept, name, location
having count(*) > 1;
Thanks very much.
Jeanne

nbateman · Jun 4, 2003

Jeanne,

You say the table contains SOME duplicate records on SOME columns.

If you mean that there are columns other than the 4 listed in your SQL, that do not contain duplicate values, then the first question is how do you decide which "duplicate" to keep.
For example, if there are two further non-duplicate columns: salary and firehim, whichever row you choose, Bill is gonna be pretty narked.

Code:

id	dept	name	loc	salary	firehim
1    	1	bill	home	10000	No
1    	1	bill	home	15000	Yes

The answer to this question will determine the solution to some extent. Does this table have a primary key?

If the records are completly duplicate across all columns, then a safe method is to select distinct into a new table from where you can verify exactly what you have thrown away, before either moving the data back into the original table, or just renaming the two tables to achieve the same.

nick

JeanneZ · Jun 4, 2003

Hi, Nick:
Thanks a lot for you answering my question.
My problem is there is no primary key on this table and I can not add one either. Just like what you said, it has same id, dept, location and diff salary. I want to summary the salary and update only one record and delete another.
Thanks.
Jeanne

sem · Jun 4, 2003

select id, dept, name, location
from tab t1
where exists (select 1 from tab t2
where t1.id = t2.id and t1.dept=t2.dept
and t1.name=t2.name and t1.location=t2.location
and t2.rowid>t1.rowid )

You may need to process nulls separately (compare some NVL expressions)

Regards, Dima

nbateman · Jun 5, 2003

Jeanne,

Try:
Create temp_tab as SELECT id, dept, name, location, sum(salary) salary
FROM tab
GROUP BY id, dept, name, location;

You can then do some checking on temp_tab before using it to replace the contents of tab.

The checking I would do on temp_tab would be to ensure that the number of rows in temp_tab is the same as the number of "distinct id, dept, name, location" rows in tab.
Also that sum(salary) is the same for both old and new tables.

Nick.

nbateman · Jun 5, 2003

Oops! meant "create TABLE temp_tab as ..." (of course).

ajayarora · Jun 5, 2003

Hi,

In response to your problem to remove the duplicates, you can try the following delete statement :

delete tab a
where a.rowid < ( select max(b.rowid )
from tab b
where a.id = b.id
and a.dept = b.dept
and a.name = b.name
and a.location = b.location );

(Above stmnt should leave the latest instance of each record )

Please Note : You might need to use NVL in the WHERE clause if the joining fields are nullable.

Regards

JeanneZ · Jun 6, 2003

Hi, experts,
Thanks a lot for your help. It works.
Jeanne

JeanneZ · Jun 6, 2003

Hi,
I still have a problem.
If I use my SQL (group by), 44 records found. And if I use your SQL (rowid), 45 records found.
I compared two results and found one id has three records with same dept, name, and location in the table, it shown twice after I ran rowid SQL. I do not know what's problem is.

In the table,
rowid id dept name location
AAAGMeAAFAAACO+ACM 101 A1 Joe West
AAAGMeAAFAAACO+ACf 101 A1 Joe West
AAAGMeAAFAAADroAAg 101 A1 Joe West

After I ran my SQL
select id, dept, name, location
from tab
group by id, dept, name, location
having count(*) > 1;
It returned only one record.

But if I ran your SQL (rowid) it returned two:
rowid id dept name location
AAAGMeAAFAAACO+ACM 101 A1 Joe West
AAAGMeAAFAAACO+ACf 101 A1 Joe West

Thanks a lot.

Jeanne

rajeevr · Jun 9, 2003

try this

delete tab a
where a.rowid not in ( select max(b.rowid )
from tab b
where a.id = b.id
and a.dept = b.dept
and a.name = b.name
and a.location = b.location );

HIH

Rajeev

R1972 · Jun 9, 2003

how abt the following statement

delete tab a where a.rowid not in ( select max(b.rowid )
from tab group by id, dept, name, location)

ajayarora · Jun 10, 2003

Hi Jeane,

I've slightly modified ur query as below :

select id, dept, name, location,count(*)
from tab
group by id, dept, name, location
having count(*) > 1;

The value of count(*) should be three in this case and if you'll run delete/select (using rowid) it will delete/return the two records and leave the latest one in the table.

Regards
Ajay.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

duplicate records

JeanneZ

Programmer

nbateman

MIS

JeanneZ

Programmer

sem

Programmer

nbateman

MIS

nbateman

MIS

ajayarora

Programmer

JeanneZ

Programmer

JeanneZ

Programmer

rajeevr

Programmer

R1972

Programmer

ajayarora

Programmer

Similar threads

Part and Inventory Search

Sponsor