Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Employee List Compare 2

Status
Not open for further replies.

robcarr

Programmer
May 15, 2002
633
GB
Hi,

I am looking for a betterway than vlookup's to find matches or differences in names from 2 sets of columns.

Column A has the name from one system
Column B has the name from another system

is it possible to compare the lists and show all possible matches for each name from one of the systems.

for example

a2 has John Smith

what I would like to do is check for all matches of the name john smith, and copy the name that is being checked and any matches to another sheet all on a row, like: -

John Smith jon Smith John Smythe John Smith

if we only have 1 match and it is a 100% match then I need to be able to show this.

We have approx 300 employees that have to have the names checked and pay each month to ensure we have 2 systems that are matching, vlookups work well to a point but only highlight exact mtaches, I am looking for a way to check for variances in cells.

Like the different ways to spell, Claire, JOhn, Steve, Chris etc.

I was hoping a loop whcih could check each cell and then use a % of match to determine if it is a close match or not then copy the find to a row as above to give me the options.

I am not sure if i have highlighted correctly what I am after but google hasnt shown me any real ways to do this.


Hope this is of use, Rob.[yoda]
 


Hi,
John Smith jon Smith John Smythe John Smith
Matches???

COUNTIF will COUNT how many and give you essentially the same result.

However fuzzy matches: don't think so, at least not with native Excel features.

Skip,
[sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 
May be better to do this in a couple of basic tables and queries in Access. Access as some comparison functions that are more robust than Excel. Worth looking into.
 



dekkerdesign,

Could you suggest some query techniques that would do the fuzzy matches?

Where
[tt]
A B
John Smith John Smith
jon Smith
John Smythe
John Smith

[/tt]






Skip,
[sub]
[glasses]Just traded in my old subtlety...
for a NUANCE![tongue][/sub]
 
>don't think so, at least not with native Excel features

True, but it isn't hard to implment a basic Soundex function, e.g.:
Code:
[blue]Public Function TSQLSoundex(strSource As String) As String
  Dim strWork As String
  Dim lng As Long
  
  strWork = UCase(strSource)
  TSQLSoundex = Left(strWork, 1)
  strWork = Mid(strWork, 2)
  strWork = ReplaceChars(strWork, "AEHIOUWY ", vbNullString)
  strWork = ReplaceChars(strWork, "BFPV", "1")
  strWork = ReplaceChars(strWork, "CGJKQSXZ", "2")
  strWork = ReplaceChars(strWork, "DT", "3")
  strWork = ReplaceChars(strWork, "L", "4")
  strWork = ReplaceChars(strWork, "MN", "5")
  strWork = ReplaceChars(strWork, "R", "6")
  
  For lng = 0 To 9
    strWork = ReplaceChars(strWork, CStr(lng) & CStr(lng), CStr(lng))
  Next lng
  
  strWork = Left(strWork & "000", 3)
  TSQLSoundex = TSQLSoundex & strWork
End Function

Private Function ReplaceChars(strSource As String, strReplace As String, strNew As String) As String
  Dim lng As Long
  
  ReplaceChars = strSource
  For lng = 1 To Len(strReplace)
    ReplaceChars = Replace(ReplaceChars, Mid(strReplace, lng, 1), strNew)
  Next lng
End Function[/blue]
 
this looks like it might do the trick how do I get to use this in access, or can it be used in Excel.

Hope this is of use, Rob.[yoda]
 
I found this VBA code for Excel for a Soundex function:
Code:
Public Function SoundEx(Word As String) As String

    Dim Num As String ' Holds the generated code
    Dim Char As String
    Dim WordLength As Long
    Dim LastCode As String
    Dim Pos As Long
    
    Num = UCase(Mid$(Word, 1, 1)) ' Get the first letter
    LastCode = GetSoundCodeNumber(Num)
    WordLength = Len(Word)
        
    ' Create the code starting at the second letter.
    For Pos = 2 To WordLength
        Char = GetSoundCodeNumber(UCase(Mid$(Word, Pos, 1)))
        ' If two letters that are the same are next to each other only count one of them
        If Len(Char) > 0 And LastCode <> Char Then
            Num = Num & Char
        End If
        LastCode = Char
    Next
    
    SoundEx = Mid$(Num, 1, 4) ' Make sure code isn't longer then 4 letters
    If Len(Num) < 4 Then ' Make sure the code is at least 4 characters long
        SoundEx = SoundEx & String(4 - Len(Num), "0")
    End If
    
End Function

Private Function GetSoundCodeNumber(Char As String) As String

   Select Case Char
      Case "B", "F", "P", "V"
         GetSoundCodeNumber = "1"
      Case "C", "G", "J", "K", "Q", "S", "X", "Z"
         GetSoundCodeNumber = "2"
      Case "D", "T"
         GetSoundCodeNumber = "3"
      Case "L"
         GetSoundCodeNumber = "4"
      Case "M", "N"
         GetSoundCodeNumber = "5"
      Case "R"
         GetSoundCodeNumber = "6"
   End Select
   
End Function

Public Function SoundExPhrase(ByVal Source As String) As String

   Dim Tokens() As String
   Dim Index As Long

   Tokens = Split(Source, " ")
   For Index = LBound(Tokens) To UBound(Tokens)
      Tokens(Index) = SoundEx(Tokens(Index))
   Next Index
   SoundExPhrase = Join(Tokens, " ")

End Function
where you use it in a cell formula like:
Code:
=soundexphrase(B1)=soundexphrase(A1)
... to return TRUE or FALSE.

Cheers, Glenn.

Beauty is in the eye of the beerholder.
 
Very cool, strongm.

->
star.gif


[tt][blue]-John[/blue][/tt]
[tab][red]The plural of anecdote is not data[/red]

Help us help you. Please read FAQ 181-2886 before posting.
 
From my StringFunction module;
Code:
Option Explicit

' Maximum length for Soundex string.
Private Const dhcLen = 4

Private Declare Function IsCharAlphaA _
 Lib "user32" _
 (ByVal bytChar As Byte) As Long

Private Declare Function IsCharAlphaNumericW _
 Lib "user32" _
 (ByVal intChar As Integer) As Long

Private Declare Function IsCharAlphaW _
 Lib "user32" _
 (ByVal intChar As Integer) As Long
 
Private Declare Function GetCPInfo Lib "kernel32" _
 (ByVal CodePage As Long, lpCPInfo As CPINFO) As Long
 
Private Const CP_ACP = 0  '  default to ANSI code page
Private Const MAX_DEFAULTCHAR = 2
Private Const MAX_LEADBYTES = 12
Private Type CPINFO
   MaxCharSize As Long                    '  max length (Byte) of a char
   DefaultChar(MAX_DEFAULTCHAR) As Byte   '  default character
   LeadByte(MAX_LEADBYTES) As Byte        '  lead byte ranges
End Type
Private Function CharCode(strChar As String) As Integer
    Select Case strChar
        Case "A", "E", "I", "O", "U", "Y", "H", "Y"
            CharCode = 0
        Case "C", "G", "J", "K", "Q", "S", "X", "Z"
            CharCode = 2
        Case "B", "F", "P", "V"
            CharCode = 1
        Case "D", "T"
            CharCode = 3
        Case "M", "N"
            CharCode = 5
        Case "L"
            CharCode = 4
        Case "R"
            CharCode = 6
        Case Else
            CharCode = -1
    End Select
End Function
Public Function dhIsCharAlpha(strText As String) As Boolean
    If dhIsCharsetWide() Then
        dhIsCharAlpha = CBool(IsCharAlphaW(AscW(strText)))
    Else
        dhIsCharAlpha = CBool(IsCharAlphaA(Asc(strText)))
    End If
End Function
Public Function dhIsCharsetWide() As Boolean
    ' Get the maximum character width of the
    ' operating system font.
    
    Dim tSystemFontInfo As CPINFO
    
    Call GetCPInfo(CP_ACP, tSystemFontInfo)
    dhIsCharsetWide = (tSystemFontInfo.MaxCharSize > 1)
End Function

Public Function dhSoundex(ByVal strIn As String) As String
    
    ' Create a Soundex lookup string for the
    ' input text.
    
    Dim strOut As String
    Dim intI As Integer
    Dim intPrev As Integer
    Dim strChar As String * 1
    Dim intChar As Integer
    Dim blnPrevSeparator As Boolean
    Dim intPos As Integer
    
    strOut = String(dhcLen, "0")
    strIn = UCase(strIn)
    blnPrevSeparator = False
    
    strChar = Left$(strIn, 1)
    intPrev = CharCode(strChar)
    Mid$(strOut, 1, 1) = strChar
    
    intPos = 1
    For intI = 2 To Len(strIn)
        ' If the output string is full, quit now.
        If intPos >= dhcLen Then
            Exit For
        End If
        ' Get each character, in turn. If the
        ' character's a letter, handle it.
        strChar = Mid$(strIn, intI, 1)
        If dhIsCharAlpha(strChar) Then
            ' Convert the character to its code.
            intChar = CharCode(strChar)
                    
            ' If the character's not empty, and if it's not
            ' the same as the previous character, tack it
            ' onto the end of the string.
            If (intChar > 0) Then
                If blnPrevSeparator Or (intChar <> intPrev) Then
                    intPos = intPos + 1
                    Mid$(strOut, intPos, 1) = intChar
                    intPrev = intChar
                End If
            End If
            blnPrevSeparator = (intChar = 0)
        End If
    Next intI
    dhSoundex = strOut
End Function

Public Function dhSoundsLike(ByVal strItem1 As String, _
 ByVal strItem2 As String, _
 Optional blnIsSoundex As Boolean = False) As Integer
 
    ' Return a number between 0 and 4 (4 being the best) indicating
    ' the similarity between the Soundex representation for
    ' two strings.

    ' Note:
    '   This code is extremely low-tech. Don't laugh! It just compares
    '   the two Soundex strings until it doesn't find a match, and returns
    '   the position where the two diverged.
    '
    '   Remember, two Soundex strings are completely different if the
    '   original words start with different characters. That is, this
    '   function always returns 0 unless the two words begin with the
    '   same character.
    
    Dim intI As Integer
    
    If Not blnIsSoundex Then
        strItem1 = dhSoundex(strItem1)
        strItem2 = dhSoundex(strItem2)
    End If
    For intI = 1 To dhcLen
        If Mid$(strItem1, intI, 1) <> Mid$(strItem2, intI, 1) Then
            Exit For
        End If
    Next intI
    dhSoundsLike = (intI - 1)
End Function
so if you have a Sub like:
Code:
Sub TrySoundex()
Select Case dhSoundsLike(InputBox("First word"), InputBox("Second word."))
   Case 0
      MsgBox "Nope, they are quite different.  Value = 0"
   Case 1
      MsgBox "Sort of similar, but not a lot.  Value = 1"
   Case 2
      MsgBox "Moderately similar.  value = 2"
   Case 3
      MsgBox "Quite similar.  Value = 3"
   Case 4
      MsgBox "Very similar. Value = 4"
End Select
End Sub
Obviously you can get the two input strings from anywhere, they do not have to come from input boxes.

John Smith & jon smith = "Very similar. Value = 4"
John Smith & John Smythe = "Very similar. Value = 4"
John Smith & John Dandy = "Moderately similar. value = 2"
John Whatever & John HoHum = "Moderately similar. value = 2"
John Smith & Harry Smith = "Nope, they are quite different. Value = 0"

The first letter must match, otherwise it returns 0.

This goes back (yikes) almost 10 years, so obviously, I need to reasses this process as strongm's code seems to be MUCH tighter.

Gerry
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top