Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Deleting Dupes program

Status
Not open for further replies.

dawnd3

Instructor
Jul 1, 2001
1,153
US
Has anyone seen any kind of dupe removal program for Access (2010)? I was trying out some code someone wrote for me for a 2003 database that would find all the dupes in one table and update the child table with the ID of the dupe records that were replacing the records being deleted. But it isn't working for 2010. So, I was hoping there was something new and improved out there, or I can just post the code I have for this one and see if anyone can spot the issue. I am not good with VBA so I do a lot of reusing and it isn't working. Thanks,

Dawn

 
Well, if it was working before, but now, then it could be as simple as debugging and finding the issue. What error do you get, and/or what results do you get? WHAT IS IT that isn't working? We can't begin to help unless we know that.

If it is in the code, if you're getting an error, hit the "Debug" button, then paste the code here, and highlight the part that is giving the error. That will give you the best chance of getting it fixed. It could be something as simple as a reference or object type that needs to be replaced, b/c is no longer used in Access 2010. Well, I say as simple as, I suppose it still depends. [smile]

I'd suggest to put the code within the tek-tips [ignore]
Code:
[/ignore]
tags, and then use the [ignore][highlight][/highlight][/ignore] tags to show the problem line(s) of code... and be sure to include as much information as possible... if it's an error on a field, wouldn't hurt to give us the field name and data type, perhaps some sample data as well.




"But thanks be to God, which giveth us the victory through our Lord Jesus Christ." 1 Corinthians 15:57
 
Hi Guys, thanks for responding. Ok here is what is happening. I call the function from a button. I just put the varibles right in the call for ease since I am just trying to do this for one table right now. It is a contact table with 350,000 records and I want to get rid of matching records based on duplicate email addresses.

Call fncRemoveDupes("tblEmployees", "strEmail", "lngEmployeeID", "tblEmployedByJoin")


Anyway, It ran all night and never completed. When I did Ctrl+break to get out of it, it was highlighted on that one line of code that is highlighted below. And I looked at the tblRemoveDupes it only had one pair of dupes added to the table. So, I am not real VBA savvy. My programmer husband helps me with it but he isn't VBA savvy either, he just knows many programming languages and can figure it out. So, hoping you experts can point us in the right direction. Would it help to run the code on a query that has JUST the dupes in it? Thanks guys?

Code:
Dim strDupe As Variant
Set dbLocal = CurrentDb()
strSQL = "DELETE * FROM tblREMOVEDUPES"
Set qdf = dbLocal.CreateQueryDef("", strSQL)
qdf.Execute


Set rst = dbLocal.OpenRecordset(strTable, dbOpenTable)
If rst.RecordCount = 0 Then Exit Function
MsgBox "beginning process" & rst.RecordCount
Do Until rst.EOF
    strDupe = Nz(rst(strFieldName).Value, "")
   [Highlight] Set rstDupe = dbLocal.OpenRecordset("SELECT * FROM " & strTable & " WHERE " & strFieldName & " = """ & strDupe & """ AND " & strFieldID & " <> " & rst(strFieldID).Value & " AND " & strFieldID & " NOT IN (SELECT DupeID FROM tblREMOVEDUPES) AND " & strFieldID & " NOT IN(SELECT MainID FROM tblRemoveDupes)")[/highlight]
    If rstDupe.RecordCount <> 0 Then
        Do Until rstDupe.EOF
            strSQL = "INSERT INTO tblREMOVEDUPES (MainID, DupeID) VALUES (" & rst(strFieldID).Value & ", " & rstDupe(strFieldID).Value & ")"
            Set qdf = dbLocal.CreateQueryDef("", strSQL)
            qdf.Execute
            rstDupe.MoveNext
        Loop
    End If
    rst.MoveNext
Loop
MsgBox "still going"
' Remove duplicates from other tables
numTables = fncParseString(strOtherTables, 1, ";", True)
If numTables = 0 Then numTables = 1
For i = 1 To numTables
    strCleanTable = fncParseString(strOtherTables, i, ";")
    Set rstDupe = dbLocal.OpenRecordset("tblREMOVEDUPES")
    
        Do Until rstDupe.EOF
            strSQL = "UPDATE " & strCleanTable & " SET " & strFieldID & " = " & rstDupe!MainID & " WHERE " & strFieldID & " = " & rstDupe!DupeID
            Set qdf = dbLocal.CreateQueryDef("", strSQL)
            qdf.Execute
            rstDupe.MoveNext
       Loop
    
Next
' remove duplicates from the main table
    If rstDupe.RecordCount = 0 Then Exit Function
    
MsgBox "almost done"

rstDupe.MoveFirst
Do Until rstDupe.EOF
    strSQL = "DELETE * FROM " & strTable & " WHERE " & strFieldID & " = " & rstDupe!DupeID
        Set qdf = dbLocal.CreateQueryDef("", strSQL)
        qdf.Execute
        rstDupe.MoveNext
Loop

MsgBox "Done!"

Exit Function
err_dupes:

    MsgBox Err.Description
    


End Function

Function fncParseString(ByVal strToParse As String, numElement As Integer, Optional strSeparator As String, Optional ReturnCount As Boolean) As String
' Returns the numElement of the strToParse using the strSeparator (default is ';')
' if the ReturnCount flag is true, then the value returned is a string value of the actual number of elements in the "array"
Dim numSep As Integer, strTempL As String, strTempR As Variant
Dim i As Integer, strToReturn() As String
i = 0
If Nz(strSeparator) = "" Then strSeparator = ";"
If Not InStr(1, strToParse, strSeparator) > 0 Then
    ReDim strToReturn(1)
    strToReturn(i) = strToParse
    If ReturnCount = True Then
        fncParseString = i
    Else
        fncParseString = strToReturn(i)
    End If
    Exit Function
End If

Do While InStr(1, strToParse, strSeparator)
    numSep = InStr(1, strToParse, strSeparator)
    strTempL = Left(strToParse, numSep - 1)
    strTempR = Right(strToParse, Len(strToParse) - (numSep + (Len(strSeparator) - 1)))
    strToParse = strTempR
    ReDim Preserve strToReturn(i)
    strToReturn(i) = strTempL
    If i = numElement And Not ReturnCount = True Then Exit Do
    i = i + 1
Loop
    
If Nz(strToParse) = "" Then strToParse = ""

ReDim Preserve strToReturn(i)
strToReturn(i) = strToParse

If ReturnCount Then
    fncParseString = Str(UBound(strToReturn) + 1)
ElseIf numElement > UBound(strToReturn) + 1 Then
    fncParseString = "Err: Out of Range"
Else
    fncParseString = strToReturn(numElement - 1)
End If
End Function


 
PS Clapper, I know how to run a query to find the dupes, I just don't know how to delete them and update the related table with the remaining ID's. :)

 
It looks like you didn't copy the complete code. I would expect the first lines to be something like:
Code:
Public Function fncRemoveDupes(strTable as String, ....) AS ...
    Dim dbLocal AS DAO.Database
    Dim qdf As DAO.QueryDef
    Dim rst As DAO.Recordset
    Dim rstDupe As DAO.Recordset

Duane
Hook'D on Access
MS Access MVP
 
You are right! (I was just testing you. LOL) Kidding. Anyway here is the first lines of code

Code:
Function fncRemoveDupes(strTable As String, strFieldName As String, strFieldID As String, strOtherTables As String)
Dim rst As Recordset, rstDupe As Recordset
Dim i As Integer, numTables As Integer, strCleanTable As String
Dim strSQL As String, qdf As QueryDef

 
The first thing I would do is change the code to:
Code:
Function fncRemoveDupes(strTable As String, strFieldName As String, strFieldID As String, strOtherTables As String)
Dim rst As [red][b]DAO.[/b][/red]Recordset, [red][b]DAO.[/b][/red]rstDupe As Recordset
Dim i As Integer, numTables As Integer, strCleanTable As String
Dim strSQL As String, qdf As [red][b]DAO.[/b][/red]QueryDef
[red][b]Dim dbLocal as DAO.Database[/b][/red]

Duane
Hook'D on Access
MS Access MVP
 
Thanks Duane, I will try it. And do you think it will help with speed for the function to run against a query that pulls up dupes first? I think there are only about 40K dupes in a table with 350K records.

Dawn

 
Hi guys, so I made the changes that Duane suggested and I also tried it out on a very small table with only 5 sets of dupes, just to see if it works. It gets through the first part and then errors on the highlighted code. (Note, my husband also inserted some code so that the immediate window would display what record it was de-duping.) The error is a "runtime 3061. Too few parameters expected. Expected 1."

Code:
Function fncRemoveDupes(strTable As String, strFieldName As String, strFieldID As String, strOtherTables As String)
Dim rst As DAO.Recordset, rstDupe As DAO.Recordset
Dim i As Integer, numTables As Integer, strCleanTable As String
Dim strSQL As String, qdf As DAO.QueryDef
Dim dbLocal As DAO.Database

Dim strDupe As Variant
Set dbLocal = CurrentDb()
strSQL = "DELETE * FROM tblREMOVEDUPES"
Set qdf = dbLocal.CreateQueryDef("", strSQL)
qdf.Execute


Set rst = dbLocal.OpenRecordset(strTable, dbOpenTable)

If rst.RecordCount = 0 Then Exit Function

MsgBox "Starting process"
i = 0

Do Until rst.EOF
    strDupe = Nz(rst(strFieldName).Value, "")
    Set rstDupe = dbLocal.OpenRecordset("SELECT * FROM " & strTable & " WHERE " & strFieldName & " = """ & strDupe & """ AND " & strFieldID & " <> " & rst(strFieldID).Value & " AND " & strFieldID & " NOT IN (SELECT DupeID FROM tblREMOVEDUPES) AND " & strFieldID & " NOT IN(SELECT MainID FROM tblRemoveDupes)")
   
    i = i + 1
    Debug.Print "Row[" & i & "]  dedups[" & rstDupe.RecordCount & "]"
   
    If rstDupe.RecordCount <> 0 Then
        Do Until rstDupe.EOF
            strSQL = "INSERT INTO tblREMOVEDUPES (MainID, DupeID) VALUES (" & rst(strFieldID).Value & ", " & rstDupe(strFieldID).Value & ")"
            Set qdf = dbLocal.CreateQueryDef("", strSQL)
            qdf.Execute
            rstDupe.MoveNext
        Loop
    End If
    rst.MoveNext
Loop
MsgBox "done with de dupe search"
' Remove duplicates from other tables
numTables = fncParseString(strOtherTables, 1, ";", True)
If numTables = 0 Then numTables = 1
For i = 1 To numTables
    strCleanTable = fncParseString(strOtherTables, i, ";")
    Set rstDupe = dbLocal.OpenRecordset("tblREMOVEDUPES")
    
        Do Until rstDupe.EOF
            strSQL = "UPDATE " & strCleanTable & " SET " & strFieldID & " = " & rstDupe!MainID & " WHERE " & strFieldID & " = " & rstDupe!DupeID
            Set qdf = dbLocal.CreateQueryDef("", strSQL)
            [highlight]qdf.Execute[/highlight]
            rstDupe.MoveNext
        Loop
    
Next
' remove duplicates from the main table
    If rstDupe.RecordCount = 0 Then Exit Function

rstDupe.MoveFirst
Do Until rstDupe.EOF
    strSQL = "DELETE * FROM " & strTable & " WHERE " & strFieldID & " = " & rstDupe!DupeID
        Set qdf = dbLocal.CreateQueryDef("", strSQL)
        qdf.Execute
        rstDupe.MoveNext
Loop
MsgBox "all done"
Exit Function
err_dupes:

    MsgBox Err.Description
    


End Function

Function fncParseString(ByVal strToParse As String, numElement As Integer, Optional strSeparator As String, Optional ReturnCount As Boolean) As String
' Returns the numElement of the strToParse using the strSeparator (default is ';')
' if the ReturnCount flag is true, then the value returned is a string value of the actual number of elements in the "array"
Dim numSep As Integer, strTempL As String, strTempR As Variant
Dim i As Integer, strToReturn() As String
i = 0
If Nz(strSeparator) = "" Then strSeparator = ";"
If Not InStr(1, strToParse, strSeparator) > 0 Then
    ReDim strToReturn(1)
    strToReturn(i) = strToParse
    If ReturnCount = True Then
        fncParseString = i
    Else
        fncParseString = strToReturn(i)
    End If
    Exit Function
End If

Do While InStr(1, strToParse, strSeparator)
    numSep = InStr(1, strToParse, strSeparator)
    strTempL = Left(strToParse, numSep - 1)
    strTempR = Right(strToParse, Len(strToParse) - (numSep + (Len(strSeparator) - 1)))
    strToParse = strTempR
    ReDim Preserve strToReturn(i)
    strToReturn(i) = strTempL
    If i = numElement And Not ReturnCount = True Then Exit Do
    i = i + 1
Loop
    
If Nz(strToParse) = "" Then strToParse = ""

ReDim Preserve strToReturn(i)
strToReturn(i) = strToParse

If ReturnCount Then
    fncParseString = Str(UBound(strToReturn) + 1)
ElseIf numElement > UBound(strToReturn) + 1 Then
    fncParseString = "Err: Out of Range"
Else
    fncParseString = strToReturn(numElement - 1)
End If
End Function

THANK YOU!!!!!

 
It sounds like the query you're building is not finding a value on the DupeID. That error is because you're creating a parameterized query (from Access's point of view), and it's not finding the parameter.

By the way, any reason you need to put this in a query anyway? Looks like you are building the SQL here and just executing it?

If there is no specific reason to run it within a query object, then I'd just do this:

Code:
Dim strSQL As String
strSQL = "INSERT INTO MyTable (MyFields) SELECT MyFields FROM WhateverTable;"
DoCmd.SetWarnings FALSE
DoCmd.RunSQL strSQL
DoCmd.SetWarnings TRUE

"But thanks be to God, which giveth us the victory through our Lord Jesus Christ." 1 Corinthians 15:57
 
dawnd3,
Learn how to debug your code by inserting Debug.Print statements and setting break points. Check out faq705-7148 so you can reply back with the actual value of strSQL.

Duane
Hook'D on Access
MS Access MVP
 
KJV and Duane. I am not a very good coder. I know Access pretty well otherwise, but most of what I do involves data structure and queries. Then there is my husband who is a senior level programmer, but not VBA savvy, but he knows enough about programming languages in general to help me figure things out. So, this code was from a database myself and a partner worked on together years ago, and he said I can use it but isn't that great about helping me out since he has moved on to bigger and better things. So I am open to all ideas and suggestions. I just want to be able to quickly delete dupes from a table and have it update the ID's in the child table. And Duane, my husband did put some Debug.Print statements in but I will have him read through that article with me.

 
What I like to do sometimes, so I know what I'm looking at is something like this:
Code:
Debug.Print "strSQL "
Debug.Print "-------------------------"
Debug.Print strSQL
Debug.Print "-----------------------------------------------------------------"

Or else put something else instead of "strSQL" - just something to help you remember what you're looking at... For some things, it isn't as big a deal as others.

But otherwise, I think if you want to make this current code work without hitches, the easiest thing will be to build the SQL statement, and run it as SQL, don't bother putting it into a Query object, and executing. The only benefit I can see for using a query object is logging things like RecordsAffected.

So this is what I was talking about doing... instead of this:
Code:
        Do Until rstDupe.EOF
            strSQL = "UPDATE " & strCleanTable & " SET " & strFieldID & " = " & rstDupe!MainID & " WHERE " & strFieldID & " = " & rstDupe!DupeID
            Set qdf = dbLocal.CreateQueryDef("", strSQL)
            qdf.Execute
            rstDupe.MoveNext
        Loop

You could simply have:
Code:
        DoCmd.SetWarnings FALSE
        Do Until rstDupe.EOF
            strSQL = "UPDATE " & strCleanTable & " SET " & strFieldID & " = " & rstDupe!MainID & " WHERE " & strFieldID & " = " & rstDupe!DupeID
            DoCmd.RunSQL strSQL
            rstDupe.MoveNext
        Loop
        DoCmd.SetWarnings TRUE

I didn't notice any logging being used in there... all you're doing is loading into a query object and running it...

And to Debug.Print there, as soon as you build your SQL statement in strSQL, you should put the Debug.Print statement immediately after.... so.... you'd have...
Code:
strSQL = "UPDATE " & strCleanTable & " SET " & strFieldID & " = " & rstDupe!MainID & " WHERE " & strFieldID & " = " & rstDupe!DupeID
Debug.Print strSQL

Then looking below with [Ctrl]+[G] (or access the Immediate Window from the View menu), you can see what your query looks like.

And here's something else I'll do:
1. Use the Debug.Print to see the SQL code
2. Copy and Paste that SQL code into a brand new Access Query (Create a query, just close the table selector tool, change to SQL view, and paste your SQL code. Then see if it'll let you into Design or Datasheet view, and go from there - it may help you find your issues.

Also, if you wanted to try running the code straight from SQL instead of the using the QueryDef object, you could simply comment out the line(s) using the querydef, and paste in the lines using the SQL directly. To comment, just put an apostrophe at the head of a line, or else use the Comment button/menu option from top of the Visual Basic Editor window.

"But thanks be to God, which giveth us the victory through our Lord Jesus Christ." 1 Corinthians 15:57
 
Thank you Duane and KJV. We will try this out tonight. Thanks for handholding me. I need it!

DAwn

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top