Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

duplicate records

Status
Not open for further replies.

JeanneZ

Programmer
May 1, 2003
55
US
Hi, experts,
I have a question to consult you. I have a table which contains some duplicate records on some columns. Some of them have two duplicate records, and some of tme have three, and so on. I can find them but I want to keep only one and delete others.
For example,
select id, dept, name, location
from tab
group by id, dept, name, location
having count(*) > 1;
Thanks very much.
Jeanne
 
Jeanne,

You say the table contains SOME duplicate records on SOME columns.

If you mean that there are columns other than the 4 listed in your SQL, that do not contain duplicate values, then the first question is how do you decide which "duplicate" to keep.
For example, if there are two further non-duplicate columns: salary and firehim, whichever row you choose, Bill is gonna be pretty narked.
Code:
id	dept	name	loc	salary	firehim
1    	1	bill	home	10000	No
1    	1	bill	home	15000	Yes

The answer to this question will determine the solution to some extent. Does this table have a primary key?

If the records are completly duplicate across all columns, then a safe method is to select distinct into a new table from where you can verify exactly what you have thrown away, before either moving the data back into the original table, or just renaming the two tables to achieve the same.

nick
 
Hi, Nick:
Thanks a lot for you answering my question.
My problem is there is no primary key on this table and I can not add one either. Just like what you said, it has same id, dept, location and diff salary. I want to summary the salary and update only one record and delete another.
Thanks.
Jeanne
 
select id, dept, name, location
from tab t1
where exists (select 1 from tab t2
where t1.id = t2.id and t1.dept=t2.dept
and t1.name=t2.name and t1.location=t2.location
and t2.rowid>t1.rowid )

You may need to process nulls separately (compare some NVL expressions)

Regards, Dima
 
Jeanne,

Try:
Create temp_tab as SELECT id, dept, name, location, sum(salary) salary
FROM tab
GROUP BY id, dept, name, location;

You can then do some checking on temp_tab before using it to replace the contents of tab.

The checking I would do on temp_tab would be to ensure that the number of rows in temp_tab is the same as the number of "distinct id, dept, name, location" rows in tab.
Also that sum(salary) is the same for both old and new tables.


Nick.
 
Oops! meant "create TABLE temp_tab as ..." (of course).
 
Hi,

In response to your problem to remove the duplicates, you can try the following delete statement :

delete tab a
where a.rowid < ( select max(b.rowid )
from tab b
where a.id = b.id
and a.dept = b.dept
and a.name = b.name
and a.location = b.location );

(Above stmnt should leave the latest instance of each record )

Please Note : You might need to use NVL in the WHERE clause if the joining fields are nullable.

Regards
 
Hi, experts,
Thanks a lot for your help. It works.
Jeanne
 
Hi,
I still have a problem.
If I use my SQL (group by), 44 records found. And if I use your SQL (rowid), 45 records found.
I compared two results and found one id has three records with same dept, name, and location in the table, it shown twice after I ran rowid SQL. I do not know what's problem is.

In the table,
rowid id dept name location
AAAGMeAAFAAACO+ACM 101 A1 Joe West
AAAGMeAAFAAACO+ACf 101 A1 Joe West
AAAGMeAAFAAADroAAg 101 A1 Joe West

After I ran my SQL
select id, dept, name, location
from tab
group by id, dept, name, location
having count(*) > 1;
It returned only one record.

But if I ran your SQL (rowid) it returned two:
rowid id dept name location
AAAGMeAAFAAACO+ACM 101 A1 Joe West
AAAGMeAAFAAACO+ACf 101 A1 Joe West

Thanks a lot.

Jeanne



 
try this

delete tab a
where a.rowid not in ( select max(b.rowid )
from tab b
where a.id = b.id
and a.dept = b.dept
and a.name = b.name
and a.location = b.location );


HIH

Rajeev
 
how abt the following statement

delete tab a where a.rowid not in ( select max(b.rowid )
from tab group by id, dept, name, location)
 
Hi Jeane,

I've slightly modified ur query as below :

select id, dept, name, location,count(*)
from tab
group by id, dept, name, location
having count(*) > 1;

The value of count(*) should be three in this case and if you'll run delete/select (using rowid) it will delete/return the two records and leave the latest one in the table.

Regards
Ajay.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top