Remove duplicate records

Kendel · Jun 26, 2003

Hi,

I want to remove some duplicated records in a table. What is the comand?

Delete from Table where ....???

Thanks.

Kendel · Jun 26, 2003

When there is a duplicate, I want to keep 1 and delete the dups. The first field is where the dupd. The other fields might have dups as well but I don't care.

Rhys666 · Jun 26, 2003

Is there an identity field on the table?

Are all fields duplicated, or just the one?

Rhys

Will code for food...

Kendel · Jun 26, 2003

Thanks Rhys. There is no identity field in the table. If there are 2 or more identical records(I meant 2 records with same first field, I want to keep just one.

Thanks

Kendel · Jun 26, 2003

Select fstField in Table
group by fstField
having count(fstField)>1

This will give a list of dups. How can I keep only 1 record in each group?

Thanks

Rhys666 · Jun 27, 2003

The below should do the trick but you'll need to configure the datatype for the @fstField variable to be the same as fstField in the table you're playing with.
I've used fstField for the Field Name you're trying to get to hold only distinct values, and TableName for the name of the Table

<CODE>
DECLARE @fstField fstfieldDataType

SELECT DISTINCT * INTO #tmp_Table FROM TableName
TRUNCATE TABLE TableName

DECLARE cur_Cases CURSOR FOR SELECT DISTINCT fstField FROM #tmp_Table
OPEN cur_Cases --Open the CURSOR

FETCH NEXT FROM cur_Cases INTO @fstField
WHILE @@FETCH_STATUS = 0 --As Long as we got data keep going.
BEGIN

INSERT INTO TableName SELECT TOP 1 * FROM #tmp_Table WHERE fstField = @fstField

/* Get the next value FROM the CURSOR AND put IN variable. */
FETCH NEXT FROM cur_Cases INTO @fstField
END

CLOSE cur_Cases --We no longer need CURSOR so close
DEALLOCATE cur_Cases --and free memory

DROP TABLE #tmp_Table
GO
</CODE>

Basically this makes a copy of your table to a temporartu table, then truncates the original. It then uses a cursor for the distinct values held in the field you want to make distinct, and for every distinct value it inserts back from the temporary table to the original table the TOP 1 records matching by the field value, so in 'INSERT INTO TableName SELECT TOP 1 * FROM #tmp_Table WHERE fstField = @fstField', even if there are 30 rows with the same value in fstField it'll only put the first of them back to your original.

OK?

Post back if you're confused, or to let me know how it goes...

Rhys

Be careful that the light at the end of the tunnel isn't a train coming the other way.

SQLSister · Jun 27, 2003

Ugh, don't need cursors! Temp tables or anything. THe FAQ on deleting records gave a method to delete in one step. This is the FAQ I'm referring to:
Handle duplicate records
faq183-2682

This is the code I just used to delete duplicate records from a table I'm importing. If you don't have an id field I suggest putting an identity field on the table first so it can use this method.

Delete dbo.VACharacteristics from
(Select CharacteristicID, MenuItemID, AttractionID from VACharacteristics
where exists(
Select NUll from VACharacteristics b
Where b.MEnuItemID = VACharacteristics.MenuItemID
and B.AttractionID = VACharacteristics.AttractionID
Group BY
b.MenuItemID, B.AttractionID
Having
VACHAracteristics.CharacteristicID <Max(b.CharacteristicID))) a
Where A.CharacteristicId = VACHARACTERISTICs.CharacteristicID

Suggest you run the select part first to ensure you are getting the records you want. I didn't need to return all three fields for the delete to work, I just put them in so I could check the data in the select before I deleted things.

If this is not a table you are importing from another source, I also suggest you fix the database structure and user interface to avoid having this happen again.

Kendel · Jun 28, 2003

Rhys666,

Thank you for the code. It seems to work but...it's kinda unusual. I ran that procedure yesterday around noon, and it is still running now. It's almost 24 hrs. I have about 1.5 mil records. Is it usual? Please advise.

many thanks.

Kendel · Jun 28, 2003

Thanks SQL Sister. I am not allowed to put an ID filed into the table so I can't use your method. Thanks anyway.

Kendel · Jun 28, 2003

In the results pannel, I got:

( 0 row(s) affected)

( 0 row(s) affected)

.........so on

Is it supposed to be 1 row(s) affected???

Kendel · Jul 2, 2003

I still have problem to remove the dups. Anyone can help???

Like I said, my table doesn't have pk so I use Rhys666's method. I have 150 records, each has 12 fields. It never finish running with (0 row affected) each line.

Anyone know a better method?

Thanks alot.

SQLSister · Jul 2, 2003

A cursor will take forever to run on 1.5 million records. (you said this earlier, although in the last post you said 150 records) No table with this number of records should EVER be without a primary key or some other unique identifier!! GO to your boss and have him overrule the idiot who won't let you put one on the table. If necessary, put an identity field on temporarily til you clean up the records and then remove it!

I hesitate to ask, but you are doing this on a development server, right? You need to test your method on development before you affect the records in a production table of this size. I'm also concerned that you don't care at all what the other fields contain, how can you be sure you are keeping the record you want to keep if the data inthe other fields is different?

Oh yeah, if it really is 150 records, you could have removed the duplicates manually by now.

Kendel · Jul 2, 2003

Thank you SQLSister. You brink up a very good point.

Let me expain alittle bit more about this table.
Not that I don't care about other fileds. I just want to remove some dups in filed1 so I can do a COUNT for a report. This one a copy(temp) of the production table.

Data is load from outside source. Some id#s have more than 1 products, and one products can be owned by an id#. That's why they didn't a PK on the table.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Remove duplicate records

Kendel

Programmer

Kendel

Programmer

Rhys666

Programmer

Kendel

Programmer

Kendel

Programmer

Rhys666

Programmer

SQLSister

Programmer

Kendel

Programmer

Kendel

Programmer

Kendel

Programmer

Kendel

Programmer

SQLSister

Programmer

Kendel

Programmer

Similar threads

Part and Inventory Search

Sponsor