Query for close or near duplicates

tyemm · Apr 21, 2006

I need to examine a table for duplicates; not exact duplicates, as PKs would be different, but one or in all probability several fields would be very similar, due to mistyping, etc.,. So I need a query that would return records where

fieldA is LIKE fieldA; fieldB is LIKE fieldB; fieldC is LIKE fieldC and so on. Ideally I want to be able to adjust the tolerance of the LIKE to increase or decrease discrimination, or to cast a wider or narrower net for possible duplicates.

THWatson · Apr 22, 2006

tyemm
Would the following approach work for you?

1. Create a Search form. Let's call it frmSearch.
2. Create unbound text boxes for as many fields through which you wish to search. (It would appear you want to search on multiple fields) Let's call one text box txtFieldA, a second one txtFieldB, and so on.
3. In the query column for FieldA, enter criteria
Like "*" & Forms!frmSearch!txtFieldA & "*"
4. In the query column for FieldB, enter criteria
Like "*" & Forms!frmSearch!txtFieldB & "*"

Thus, if entries in FieldB were Jak, Jake, Jakk, Ake...and in the Search form you entered "ak", all of these would be returned.

Tom

MichaelRed · Apr 22, 2006

Also use search / advanced search for "Soundex", which functions similar to "Like". It is an excellent tool for alternative spellings but has other limitations which may cause it to miss "typo" errors where different letters apprar.

MichaelRed

tyemm · Apr 25, 2006

Thank you both. I'm trying, but I don't see why I have to use a form for this. In simple English, I'm trying to create a query that will examine two or more fields in a table for duplicates that may have been missed by others, due to misspellings, etc. Thus 'among those records where the lastname fields are similar, which also have firstname fields similar?'

Right now I'm getting say a bunch of Adlers, but if one is Robert and one is Naomi, I don't want to flag either of those records!

Tnx.

MichaelRed · Apr 25, 2006

sorry that you are not following the suggestion(s), but you did ask for help in the area. hopefully, someone else can / will explain it to you better. wheather you use a module, query, form or an amalgamation is not the issue. If you want to identify similar to some degree you need to do comparisions on the objects to test that similarity quotient. "like", as a function, selects a specific set of characters in a pattern, "soundex" selects values based on a computed value of the "sounds like" quality of a (SHORT) group of letters. combinations of these may be used to aproximate what you described initally, although you need to add considerable logic and functionality to the basic concepts. these fora often do not give concise answers to larger questions, as it is assumed that, as the catch phrase below the logo states, this is for "professionals" - and threse are presumed to be competent and capable (as well as professional).

MichaelRed

tyemm · Apr 25, 2006

MichaelRed--

Thank you! I certainly hope you didn't think I was criticizing a forum, when I wrote that I didn't see why I need to use a "form" to correctly resolve my query!! All I meant was--I have a query that identifies duplicate records on the basis of absolute duplicates in one field, and I want to modify that with a "Like" expression, such that I am pulling all records with, say, the middle 5 characters the same, in case someone wrote Schickele as a last name and another person wrote Shickeley for the same person. In other words, a gross filter rather than a fine one. Then I want to be able to return all items which meet the first criterion, and that also meet a similar criterion for first names, such that I might get Peter Schickele and Peter Shikeley, but not George Schickele, etc. I'm just cleaning up a table I inherited, so I thought I didn't need a form, only a query that might lie behind a form, to be sure!
Below is my example for a query looking for first names that have the first 7 characters the same:

SELECT [MS Table 1].ID, [MS Table 1].[First Name], [MS Table 1].[Last Name], [MS Table 1].[Street Address], [MS Table 1].Facility, Left([MS Table 1].[First Name],7) AS Expr1
FROM [MS Table 1] LEFT JOIN [MS Table 1] ON [qry Has SS & DOB or is Recent].ID = [MS Table 1].ID
WHERE (((Left([MS Table 1].[First Name],7)) In (SELECT Left([MS Table 1].[First Name],7) from [MS Table 1]
group by Left([MS Table 1].[First Name],7) having count(*) > 1)))
ORDER BY ([MS Table 1].[Last Name]);

Tom

UnicornRainbow · Apr 26, 2006

Tom, understand that for a computer, Schickele is not "Like" Shikeley.

While these names may sound alike, as far as the characters and their locations in each name, the only resemblance between the two is that they have an "S" in the first character slot.

The logic behind having a computer reconcile these is mammoth (IMHO). The Soundex program Mr. Red referred to seems to already contain that logic.

Short of that, I have used totals queries to search for typos in the name field by grouping on the street name and the city:

Code:

SELECT Count(tblCust.cName) AS CountOfcName, Trim([csAddr1]) & Trim([csCity]) AS StreetTown
FROM tblCust
GROUP BY Trim([csAddr1]) & Trim([csCity]);

This shows me when two names are listed at the same address. This is more manual than you would like, I'm sure, and I can't know if there are appropriate fields you can compare.

tyemm · Apr 26, 2006

Thanks--

I'll try that; I see that the logic is to look for SOME field that is likely to have consistent data. Obviously if each and every field can be a mess of errors, then God Help Us.

I have run into this situation, re Last Names: name in one table contains a hyphenated second portion, that is not in the first table. For that I tried Left([tblCustName].[Last_Name],5), etc., to see if the first few letters were the same. Doesn't get everything, obviously.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Query for close or near duplicates

tyemm

Technical User

THWatson

Technical User

MichaelRed

Programmer

tyemm

Technical User

MichaelRed

Programmer

tyemm

Technical User

UnicornRainbow

Technical User

tyemm

Technical User

Similar threads

Part and Inventory Search

Sponsor