Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Huge table with every record duped. How to delete dupes?

Status
Not open for further replies.

red54

Technical User
Sep 15, 2001
90
0
0
US
I have inherited a database with a table containing scanned documents. One or more docs per row. The table is in the range of 3-4 gigs. Had to split the table. Problem, in splitting the table I inadverently duped every row. Need a simple way to remove only the even numbered rows of the sorted data. The create new table with structure only and use an append query takes too long, too much disk space. The table is in SQL 2000 database. Table has 30000+ rows. Don't mind creating another table with only the 15000 rows, but would rather delete as it will be in place.

Time flies when you don't know what you're doing...
 
What fields have you got in the table? You need to have a field(s) which contains the duplicate values and then another field(s) which you can use to determine which one of the duplicates you want to keep.

--James
 
Thank you for the reply. I actually have 3 fields that comprise the key.

But I think I found an answer in Microsoft: Access Module Coding Forum. See: Eliminate Dupes with VB. And: How to delete a record in a table? in the same forum. I know that there are exactly two of each record, and I can simply delete the even number ones. They aren't indexed or used as keys so I won't have the problem of deleting one and both going because Access can't determine which one of the two I mean to delete.

I'll post my final solution to this here.

Time flies when you don't know what you're doing...
 
You could do it in Query Analyser using a temporary table

-- First select only unique records
SELECT DISTINCT *
INTO #TempTable
FROM YourTableName

-- Empty your original table
TRUNCATE TABLE YourTableName

-- Repopulate it
INSERT INTO YourTableName
SELECT * FROM #TempTable

--Delete the temp table
DROP TABLE TempTable
 
Tried that on a test system. SELECT DISTINCT *
returns:

The TEXT, NTEXT or IMAGE data types cannot be used with distinct.

I have oleobjects in actual table. The table I'm using for test doesn't have this field, but still can't be done.

Any suggestions are greatly appreciated. Tried another and it wants to delete both records using recordset DAO. Guess I'll have to run rs DAO and create new table from each pair.

Time flies when you don't know what you're doing...
 
You could also try this: create a atable with the same data structure as yours but index the three fields that make up the composite key WITH NO DUPLICATES. Insert from the original table into this one and all the dups will be automatically left out. TRUNCATE the original table and insert back from the second table and drop the latter.
 
That's the standard way to do it BugSlayer, but I'm having trouble with eating up gigs of space when I do anything. Cleared 22 gigs of logs yesterday, and I'm afraid that I'll run out of space; plus with the scanned docs in the table as ole fields it takes a long time to find out that it didn't work. Had a "not enough memory" working with this table earlier; after an hour and a half of query running.



Time flies when you don't know what you're doing...
 
Here's what worked for me. Had to create "structure only" of new file(fdia1). And then used this code to copy the second row to new file, if and only if the two consecutive rows in source table have equal key fields.

Actual run it got a "temporary disk file full" error and stopped on the .update. This after 13,000 of the 31,000 rows.

I got the record that it was ready to write and mod'd the query to start at that record.


Dim i As Integer
Dim rs As DAO.Recordset
Dim rsout As DAO.Recordset

in actual code I dim'd t1-t6 and used them rather than
textbox controls on form

fdia is a query that sorts the rows into t1,t2,t3
order so that the consecutive rows should be equal

NOTE: this won't work if there are not exactly two consecutive rows that are equal

Set rs = CurrentDb.OpenRecordset("fdia")
Set rsout = CurrentDb.OpenRecordset("fdia1")

rs.MoveLast 'removed this as it took a long time to do
rs.MoveFirst
Do While Not rs.EOF

Text1 = rs.Fields(0)
Text9 = rs.Fields(1)
Text5 = rs.Fields(2)
rs.MoveNext
Text3 = rs.Fields(0)
Text7 = rs.Fields(1)
Text11 = rs.Fields(2)

If Text1 = Text3 And Text9 = Text7 And Text5 = Text11 Then
'rs.Delete this deleted two records from source table

With rsout
.AddNew
.Fields(0) = rs.Fields(0)
.Fields(1) = rs.Fields(1)
.Fields(2) = rs.Fields(2)
.Update
End With
end if

rs.MoveNext

Loop



Time flies when you don't know what you're doing...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top