Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Weird sorting request 1

Status
Not open for further replies.

dgillz

Instructor
Mar 2, 2001
10,043
US
Ok weirdest request I have had in a while. I've got 4,000 records of customer or prospects in Excel. This is already sorted alphabetically by company name. I want to add a "sort order" column at the far right. Here it gets weird. Let's say the first name in my excel spreadsheet is "Anaheim Memorial Hospital", while way down at record 2000, I have "Memorial Hospital of Anaheim". Because of the similarity - they have at least 2 words in common - I want both of these to have sort value of 1. In my second record, I want VBA to loop through the records, finding the same words in perhaps another situation say "Birmingham Iron Works" and Iron Works - Birmingham", giving both the sort value of 2. It is of course possible that 4 or 5 records would be a potential match, and be assigned the same sort order.

In the case above the 4 records actually represent only 2 companies. My customer just had inconsistent data entry habits over the years. The end goal is to present a re-sorted list back to my customer who can then peruse through the list and say "yes, these 2 entries are actually the same company", and modify the spreadsheet accordingly. Eventually this all be imported into a MS SQL database.

So if anyone has any ideas how to do this I would be grateful. I also to ignore word like "the", "a", "an", "of", "in", etc. in the vba string comparison.

Software Training, Implementation, Programming and Support for Macola Progression, Macola ES, Macola 10, Synergy, and Crystal Reports.
Check out our Macola tools:
 
Since it's a one-time fix, I don't think you have to use all VBA.
One thing I'd think MIGHT help speed it up or make it easier would be to go ahead and add some sort of ID to each row, just in case you need it later.
But what I'm mainly thinking about is (non-vba portion) using Data - Text to Columns to split that field on spaces. Then you may be able to move things around.

One idea I have, and it may turn out to be more trouble this way, is to then basically stack all the split values into one column, then use conditional formatting to highlight dups, and work from there. The trick here will be keeping that ID I mentioned in the first statement with each piece.

There may be another way to fix this as well using MS Query, but I've used it so rarely that I dare not venture a guess at the details.

For VBA, what I would likely do is use an array string variable to split the values of the names. You could then compare all the pieces to all other pieces, and when you have all matching pieces, just in different order, then you definitely have a match, and you can give them both the same ID and/or sort order. I suppose you'll need 2 array variables for what I'm thinking here. One for being String1, and the second for String2. So basically, you loop through all rows once for String1 variables, and for each instance in that loop, you loop through all the rows again, skipping the one that's the same row as String1, of course, since it's comparing to itself. Whenever you find a match, you give it the same ID/sortNumber/NormzliedName (I would be tempted to do all 3 at first)

Regardless of what you do for sort, I would highly suggest that if you're doing all this anyway, that you create a "normalized name" field where you put what should be the standard name. That way, if they have no obligation to keep the mess they started with, they can instead use the normalized name and not have as much a headache in the future.

This is just early morning brain storming. Hopefully it's at least a little help.

"But thanks be to God, which giveth us the victory through our Lord Jesus Christ." 1 Corinthians 15:57
 
Far from being perfect, but it may get you started :)

Let's say you have this in Excel:

[pre]
A B
1
2 Anaheim Memorial Hospital [blue] =MessWithText(A2)[/blue]
3 Birmingham Iron Works
4 Memorial Hospital of Anaheim
5 Iron Works - Birmingham
[/pre]

And a User Defined Function (in Module):

Code:
Public Function MessWithText(ByRef str As String) As String
Dim myArray() As String
Dim x As Long, y As Long
Dim TempTxt1 As String
Dim TempTxt2 As String

If Len(Trim(str)) = 0 Then
    MessWithText = ""
    Exit Function
End If

str = Replace(str, " - ", " ")
str = Replace(str, " of ", " ")

myArray = Split(str, " ")
[green]
'Alphabetize Names in Array [/green]
  For x = LBound(myArray) To UBound(myArray)
    For y = x To UBound(myArray)
      If UCase(myArray(y)) < UCase(myArray(x)) Then
        TempTxt1 = myArray(x)
        TempTxt2 = myArray(y)
        myArray(x) = TempTxt2
        myArray(y) = TempTxt1
      End If
     Next y
  Next x

MessWithText = Join(myArray, " ")

End Function

If you copy [blue]the formula from B2[/blue] down to B3, B4, B5, you get:

[pre]
A B
1
2 Anaheim Memorial Hospital Anaheim Hospital Memorial
3 Birmingham Iron Works Birmingham Iron Works
4 Memorial Hospital of Anaheim Anaheim Hospital Memorial
5 Iron Works - Birmingham Birmingham Iron Works
[/pre]


Have fun.

---- Andy

There is a great need for a sarcasm font.
 
That is definitely a great start. I'm gonna tweak the code a little bit but I think is going to address 90% of of the issue, and way less trouble than looping through 4000+ records to give each record a numeric sort order. Many thanks and have a star.

Software Training, Implementation, Programming and Support for Macola Progression, Macola ES, Macola 10, Synergy, and Crystal Reports.
Check out our Macola tools:
 
Glad to help :)
Forgot to give a credit to Applying An Alphabetical Sort To Your VBA Array List

And I agree with kjv1611:

kjv1611 said:
I would highly suggest that if you're doing all this anyway, that you create a "normalized name" field where you put what should be the standard name. That way, if they have no obligation to keep the mess they started with, they can instead use the normalized name and not have as much a headache in the future.

Have fun.

---- Andy

There is a great need for a sarcasm font.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top