Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Eliminate records with duplicate values in one of two fields.

Status
Not open for further replies.

neeko1226

MIS
Jul 24, 2003
82
US
Good morning/afternoon and happy holidays.

I have a process in which a set of data is loaded to a database each month. The new file is appended to an accumulative file which is the main table in the database. Each record contains an Application Number, Loan Number and File Date field (among others). Occasionally a record will come accross with the same application or loan number as another record that was added already to the database. In this case, the analyst would like to see the most recent record (the one from the latest monthly file) kept, and the older record discarded.

My plan was to run a find dup query on both the application number and on the loan number after the monthly append, and then identify the record with the Max file date from that subset. I've been able to get this far, but now I can't come up with a logical way to get the Max file date dup records, and all the records with no dups assembled together in a table again.

Does anyone have any experience with this and, if so, can you provide me with a possible solution?

Thanks in advance for any assistance you can provide.

Neeko
 
Have a look here: faq701-5721

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Thank you for the quick reply PH. Will the process detailed in your FAQ delete only the value in the delete field [DF] or will it delete the entire record?
 
In the SQL world, the DELETE instruction removes whole rows ...

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
In a test copy of the db, I attempted to execute your code and it slowed my system down, but didn't finish running. I had to <ctrl><break> out of it. It was almost like it was stuck in a loop. Here's the exact code from the SQL view of my query wizzard:

DELETE CurrentExceptions.Applnumb, CurrentExceptions.FileDt
FROM CurrentExceptions
WHERE (((CurrentExceptions.FileDt) Not In (SELECT Max([FileDt]) FROM [CurrentExceptions] GROUP BY [Applnumb])));

I tried changing the query type to a select query and it did the same thing. Any ideas?
 
I don't see any reference to Loan Number in your posted SQL code.
Could you please post the SQL code of the dup query you mentionned ?

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
I tried it on application number first, I would repeat the same steps for loan number after the application number dups are eliminated.

Here's the SQL for the find dup query:

SELECT CurrentExceptions.Applnumb, CurrentExceptions.FileDt INTO tbl_AE_Applnumb_Dups
FROM CurrentExceptions
WHERE (((CurrentExceptions.Applnumb) In (SELECT [Applnumb] FROM [CurrentExceptions] As Tmp GROUP BY [Applnumb] HAVING Count(*)>1 )))
ORDER BY CurrentExceptions.Applnumb;
 
get the Max file date dup records, and all the records with no dups assembled together
SELECT A.*
FROM CurrentExceptions AS A INNER JOIN (
SELECT Applnumb, MAX(FileDt) AS LastDt FROM CurrentExceptions GROUP BY Applnumb
) AS B ON A.Applnumb = B.Applnumb AND A.FileDt = B.LastDt

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Solution for the future: think of a unique index on the table in such a way that duplicates cannot be entered.

For current situation:


SQL version works IF there is at least a field that can make the difference between the duplicates.

Depending on the specific needs/application/design, that may be impossible (all duplicates are 'perfect clones' of an original record that cannot be determined.

In VBA, you can open a recordset (dbOpenDynaset in DAO or adOpenStatic in ADO, really depends on your Access version) based on all duplicate records for a specific set, then:

rst.MoveNext
While not rst.EOF
rst.Delete
rst.MoveNext
Wend

(or something like this)

HTH



[pipe]
Daniel Vlas
Systems Consultant

 
I've attached the vba code I wrote in an attempt to get rid of the dups I don't need. It is only to delete records with duplicate "Applnumb", once I get it to work I will create a simalar process to delete duplicate "LoanNum" records. I appologize for the naming conventions, this db was built by someone else and I'm only trying to help them with a portion of it.

To rehash and clarify, the goal is to append the records from the monthly download file ("CurrentExceptions") to the accumulative file ("Accumulated Exceptions MBP Year-to-Date"). Each table contains the fields "LoanNum" and "Applnumb". It's possible that there could be records on the monthly download file that have the same LoanNum or Applnumb as records on the Accumulative file. It's also possible that the record on the monthly download file could contain more recent data than the record on the accumulative file so they would NOT be an exact dup. They would like to keep the most recent info in the accumulative file, but they do not want records with the same LoanNum or Applnumb to be kept.

For whatever reason, this code seems to be working when I step through it, but it won't finish. There are only 10000 records in the monthly file and it ran for over 4 hours (before I killed it) on Friday. Can anyone tell me what I'm doing wrong or give me a more efficient solution?

Dim db as datatbase
dim rs as recordset
dim rs1 as recorset
dim rscount as integer
dim acctnum as string


set db = currentdb


Set rs = db.OpenRecordset("Select * from CurrentExceptions")

With rs
.MoveFirst

Do Until rs.EOF

acctnum = rs!Applnumb

Set rs1 = db.OpenRecordset("Select * from [Accumulated Exceptions MBP Year-to-Date] where Applnumb = '" & acctnum & "' ")

rscount = rs1.RecordCount

If rscount > 0 Then

DoCmd.RunSQL "DELETE [Accumulated Exceptions MBP Year-to-Date].* " _
& " FROM [Accumulated Exceptions MBP Year-to-Date] " _
& " WHERE Applnumb = '" & acctnum & "' "

End If

.MoveNext

Loop

End With
 
Why not simply this ?
DoCmd.RunSQL "DELETE * " _
& " FROM [Accumulated Exceptions MBP Year-to-Date] " _
& " WHERE Applnumb IN (SELECT Applnumb FROM CurrentExceptions)"

Performances tip: create an index on Applnumb in both tables.

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top