Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

value points to C1 #2 4

Status
Not open for further replies.

natedeia

Technical User
May 8, 2001
365
0
0
US
I can't post another on the other thread for some reason, max 25 posts maybe.
WOW, that is all I can say.
I have to say that I love your last set up, selecting the threshhold manually works terrifically. I actually use your work on other information, surprisingly I can use it on addresses, we have alot of online purchasing too, millions a month of $ just from one site which is one of the largest retail companies in the US and several others too. But I like the threshhold selection just because I am using it on DL's and FullMicr's too. I set it at .80 for the FM's which already have 9 digits matching due to the ABA (routing #'s) and alot of acct #'s which follow the ABA are similar already so I set it at .80 to .90 otherwise it flaggs too much similar data. A far as the smaller sets digits I use it there and select .50 to .70, generally we find that it varies but usually it is going to be around 4 or less digits being changed.
So basically I really like the last one with the selector.
I am speechless that you have taken brought this idea to life and have perfected it like you have. In this new overhaul, is it possible to put that idea I mentioned before about highlighting the digits that do not match in RED or something? That is one of the concerns when others try it.
Update, the co-worker that ran the macro on 64,000 rows says it is on about 14,000, it is 11am now and he stared it around 2 or 3 i think yesterday, sorry so longggggggggggg.
 
As I said, if you can restrict the specs, I can make it go much faster. (How does 64,000 rows in 10 minutes sound?)

Although if you are looking for more than 2 digits changed, I'll have to go back to the drawing board.

Here is a routine (along with a test sub) that compares two cells and turns non-matching digits to red:
Code:
Option Explicit

Sub test()
  [D1] = "'123456789"
  [D2] = "'128456739"
  ColorDifferentDigits Range("D1"), Range("D2")
End Sub

Sub ColorDifferentDigits(A As Range, B As Range)
Dim sA As String
Dim sB As String
Dim i As Integer
Dim n1 As Integer
Dim n2 As Integer
  sA = A.Text
  sB = B.Text
  For i = 1 To Len(sA)
    If Mid(sA, i, 1) <> Mid(sB, i, 1) Then
      If n1 = 0 Then n1 = i Else n2 = i
    End If
  Next i
  If n1 > 0 Then
    A.Characters(Start:=n1, Length:=1).Font.ColorIndex = 3
    B.Characters(Start:=n1, Length:=1).Font.ColorIndex = 3
  End If
  If n2 > 0 Then
    A.Characters(Start:=n2, Length:=1).Font.ColorIndex = 3
    B.Characters(Start:=n2, Length:=1).Font.ColorIndex = 3
  End If
End Sub
You should be able to adapt this to the existing code by inserting one line of code in the main Sub (and copying the entire "ColorDifferentDigits" Sub to the module.)
Code:
 :
 :
   If NearSame(sAtRow, sAtRowX) Then
     .Cells(nRow, 1).Font.Bold = True
     .Cells(nRowX, 1).Font.Bold = True
     ColorDifferentDigits .Cells(nRow, 1), .Cells(nRowX, 1)
   End If
 :
 :
Note that in order for this to work, the data must be Text. It may be that your numbers are numbers in which case you will need to do some fiddling to force them to be Text.


 
Here is a simpler and better version of the Sub:
Code:
Sub ColorDifferentDigits(A As Range, B As Range)
Dim sA As String
Dim sB As String
Dim i As Integer
  sA = A.Text
  sB = B.Text
  For i = 1 To Len(sA)
    If Mid(sA, i, 1) <> Mid(sB, i, 1) Then
      A.Characters(Start:=i, Length:=1).Font.ColorIndex = 3
      B.Characters(Start:=i, Length:=1).Font.ColorIndex = 3
    End If
  Next i
End Sub
Please disregard the version in the previous post.

 
How about 64,000 rows in 4 minutes, 30 seconds (on an old computer)?

:)

Makes use of the fact that you can't add two identical indexes to a collection object. Creates a list in column B of the row(s) containing values within 2 digits.
Code:
Sub FlagMatches()
Dim myList As New Collection
T1 = Now()
On Error GoTo FlagRow:
z = 0
For a = 1 To 9
   For b = (a + 1) To 9
      For x = 1 To 65000
         SourceNum = Right("000000000" & Range("A" & x).Value, 9) ' pad with zeros to 9 digits
         Pattern = Left(SourceNum, a - 1) & "*" & Mid(SourceNum, a + 1)
         Pattern = Left(Pattern, b - 1) & "*" & Mid(Pattern, b + 1)
         myList.Add x, Pattern
      Next x
      ' Now empty the collection for the next pass
      For x = 1 To myList.Count
         myList.Remove 1
      Next x
      z = z + 1
      Debug.Print "Pass " & z & " finished."
   Next b
Next a
T2 = Now
MsgBox "Finished! Started at " & T1 & ", finished at " & T2 & "."
End
Just make sure your data starts in row one (or adjust the loop), and keep column B clear for the results (again, or adjust the code to write to a different column).

VBAjedi [swords]
 
Accidentally left off the last part of my code with the error handler. Doh!

Code:
Sub FlagMatches()
Dim myList As New Collection
T1 = Now()
On Error GoTo FlagRow:
z = 0
For a = 1 To 9
   For b = (a + 1) To 9
      For x = 1 To 65000
         SourceNum = Right("000000000" & Range("A" & x).Value, 9) ' pad with zeros to 9 digits
         Pattern = Left(SourceNum, a - 1) & "*" & Mid(SourceNum, a + 1)
         Pattern = Left(Pattern, b - 1) & "*" & Mid(Pattern, b + 1)
         myList.Add x, Pattern
      Next x
      ' Now empty the collection for the next pass
      For x = 1 To myList.Count
         myList.Remove 1
      Next x
      z = z + 1
      Debug.Print "Pass " & z & " finished."
   Next b
Next a
T2 = Now
MsgBox "Finished! Started at " & T1 & ", finished at " & T2 & "."

End
FlagRow:
If Len(Range("B" & x).Value) = 0 Then
   Range("B" & x).Value = "Within 2 digits of row " & myList(Pattern)
Else
   Range("B" & x).Value = Range("B" & x).Value & ", " & myList(Pattern)
End If
Resume Next

End Sub
There are two drawbacks. . . Changing the threshold requires adding/deleting a loop. But that's the price you pay for speed. You'd just have to create one of these for each different task you do like this.

The second "drawback" is that if two items are a better match than the threshold, you get multiple hits. So if the number in row 5 is within one digit of the number in row 3, column B on row 5 would read "Within 2 digits of row 3, 3, 3, 3, 3, 3, 3, 3". But that can actually be used to help you spot very close matches. . .

VBAjedi [swords]
 
Very nice, VBAjedi!

I took the liberty of combining your code with mine and came up with this:
Code:
Option Explicit

Sub FlagMatches()
Dim myList As New Collection
Dim T1, T2
Dim A As Integer
Dim B As Integer
Dim x As Long
Dim SourceNum As String
Dim Pattern As String
Dim z As Long
Dim v As Variant
Dim i As Long

T1 = Now()
Range("B:B").Clear
z = 0
v = Intersect(Range("A:A"), ActiveSheet.UsedRange)
For i = LBound(v) To UBound(v)
  v(i, 1) = Right("000000000" & v(i, 1), 9)[COLOR=green] ' pad with zeros to 9 digits[/color]
Next i
On Error GoTo FlagRows:
For A = 1 To 8
   For B = (A + 1) To 9
      z = z + 1
      Application.StatusBar = "Pass " & z & " of 36."
      DoEvents
      For x = LBound(v) To UBound(v)
         SourceNum = v(x, 1)
         Pattern = Left(SourceNum, A - 1) & "*" & Mid(SourceNum, A + 1)
         Pattern = Left(Pattern, B - 1) & "*" & Mid(Pattern, B + 1)
         myList.Add x, Pattern
      Next x[COLOR=green]
      ' Now empty the collection for the next pass[/color]
      For x = 1 To myList.Count
         myList.Remove 1
      Next x
   Next B
Next A
T2 = Now
Application.StatusBar = False[COLOR=green]
' Comment out the next line for production version:[/color]
MsgBox "Finished! Started at " & T1 & ", finished at " & T2 & "."
Exit Sub

FlagRows:
If Range("B" & x) = "" Then
   Range("B" & x) = "Within 2 digits of row " & myList(Pattern)
   ColorAsterisks Range("A" & x), Pattern
Else
   Range("B" & x) = Range("B" & x) & ", " & myList(Pattern)
End If
If Range("B" & myList(Pattern)) = "" Then
   Range("B" & myList(Pattern)) = "Within 2 digits of row " & x
   ColorAsterisks Range("A" & myList(Pattern)), Pattern
Else
   Range("B" & myList(Pattern)) = Range("B" & myList(Pattern)) & ", " & x
End If
Resume Next
End Sub

Sub ColorAsterisks(A As Range, B As String)
Dim sA As String
Dim sB As String
Dim i As Integer
  sA = A.Text
  sB = B
  For i = 1 To Len(sA)
    If Mid(sA, i, 1) <> Mid(sB, i, 1) Then
      A.Characters(Start:=i, Length:=1).Font.ColorIndex = 3
    End If
  Next i
End Sub
It includes the coloration that natedeia is looking for as well as flagging both rows when a near match is found.

 
I am absolutely flabergasted. I can not believe you guys, totally out done yourself. This makes a huge point here, bigger than you think. These people here are looking into something (software)that is supposed to find this type of stuff, I disliked it. Tell you the truth, I bet you that they will not go forward after what they see this do! There were intrigued with what I showed them already, they were going to run test against their new trial software..... it may be dead in the water, esp since VBAjedi (million thanks) provide that version, and you Zathras with your revamped version then taking the LIBERTY to combine them.[thumbsup2]
I seen it last night, did not post, couldn't sleep. Now I am at work, the only one so holding down the fort, I have ran the combinded version and now my head aches. Need a manual!
I noticed that if it matches exactly that it will highlight the first two digits, can you give a run down of what it it is showing, looks for just two matches? Runs fast h..
Zathras, the new one you made before combining it with VBA jedi's, can't seem to get that to work properly. I think I am doing it right and in D1 and D2 it places two #'s in there that have two nonmatching #'s. Need some guys like you two here,man.
 
To eliminate the spurious flagging of the first 2 digits when the numbers are identical, insert these lines after the "FlagRows:" label:
Code:
If Range("A" & x) = Range("A" & myList(Pattern)) Then
  Resume Next
End If
I don't know why the earlier example (with D1 and D2) doesn't work for you. I just pasted it into a new blank spreadsheet and it works ok for me. Be aware that it makes a BIG difference if the data are text or numbers. The coloration only works when the data are text.

If you wind up saving a few bucks by not having to go third party, please remember that this site is "Member Supported" and consider making a contribution for its continuance. Neither VBAJedi nor I will ever see a penny. We are all just volunteers and work for the joy of helping people and solving puzzles (not necessarily in that order), but the folks who maintain the site could use all of the financial help they can get to keep it up and running.
 
i think the earlier version is what i will stick with using, i like the way you set that threashold, using on different data is ideal. the speed was not like your new one, that one produced so much data to scan visually to see which rows it matched cause of it putting the matches of exact ones in colum b so much that if it matched another it would be at the end of the line in B. i got that one to work with the color coding on the prev code. sorry, prob doing something dumb the first try.
too bad large companies will not recognize the small guys, i know this site has helped me in many ways, several projects,etc. i do however suggest this site to everyone one i know when it comes to finding answers to technical questions.
 
If you want the output to be in the form of pairs of near-match numbers in, for example, columns C and D, that is no big deal.

Add this line at the bottom of the main sub, just before the "Resume Next" (The second one, not the one after the tag.)
Code:
PostResults Range("A" & x), Range("A" & myList(Pattern))
Then add these routines at the end of the module:
Code:
Sub PostResults(A As Range, B As Range)
Dim r As Range
Dim nOffset As Long

  Set r = Range("C1")[COLOR=green] '<------ modify for different location[/color]
  If Not AlreadyInResults(r, A.Text, B.Text) Then
    If r.Text = "" Then
      nOffset = 0
    Else
      If r.Offset(1, 0).Text = "" Then
        nOffset = 1
      Else
        nOffset = r.End(xlDown).Row
      End If
    End If
    A.Copy r.Offset(nOffset, 0)
    B.Copy r.Offset(nOffset, 1)
  End If
  Set r = Nothing
End Sub

Private Function AlreadyInResults(ARange As Range, _
                A As String, B As String) As Boolean
Dim r As Range
Dim f As Range

  Set r = Intersect(ActiveSheet.UsedRange, Columns(ARange.Column))
  Set f = r.Find(A)
  If Not f Is Nothing Then
    If (f.Text = A And f.Offset(0, 1) = B) Or _
       (f.Text = B And f.Offset(0, 1) = A) Then
      AlreadyInResults = True
    End If
  End If
  If Not AlreadyInResults Then
    Set f = r.Find(B)
    If Not f Is Nothing Then
      If (f.Text = A And f.Offset(0, 1) = B) Or _
         (f.Text = B And f.Offset(0, 1) = A) Then
        AlreadyInResults = True
      End If
    End If
  End If
  Set f = Nothing
  Set r = Nothing
End Function
 
you know how you have that real quick one that puts the near matche cell #'s in column B, and highlight the digits red that do not match. there may be someone i can take your code to make some changes, but what i am asking is, 'Is it possible to only show a cell once if it matches another in column B, and is it possible to make your code look for more than two or less non matching digits?" Hoping to see if someone here can make it do that then have a macro to dump all the rows with Bold cells on another sheet. I am working on that part but mainly interested to know if the code can be worked that way or best bet to stick with the other one which is slower but offers that capibilities. thanks again, going to try out your new addition.

[thumbsup2]
 
Sorry, I didn't understand most of your last request.

Yes, it is possible to make the code look for more or fewer than two non-matching digits. As VBAjedi said:
VBAjedi said:
Changing the threshold requires adding/deleting a loop. But that's the price you pay for speed. You'd just have to create one of these for each different task you do like this.

 
yea, for the most part, the feature with column B is ingenious, just cant see the cell's contents entirely because it can get kinda long with all the redundant matching cells showing up. that was one part that would make it easier to look at, because the goal is not to spend too much time locating those types of trends.
like the color coding alot but am worried that if i can get someone to work on jedi's and your code, that by taking jedi's approach that it will match up 2 and 2, 3 and 3 but this type of trend we see does not normally stick to the rules. most of the time it may be that one cell is matching 5 and the other matching six, so dont know if it would pick up on those . may have to deal with the speed issue and tell the invesigators try and run less data or something. thanks again, been invaluable and hope others can beneifit from this.
 
how can i get this one to stop at last cell that has data, this one keeps going to 65 thousand,

Sub FlagMatches()
Dim myList As New Collection
T1 = Now()
Range("A:A").Font.ColorIndex = 0
Range("A:A").Font.Bold = False
On Error GoTo FlagRow:
z = 0
For A = 1 To 9
For B = (A + 1) To 9
For x = 1 To 65000
SourceNum = Right("000000000" & Range("A" & x).Value, 9) ' pad with zeros to 9 digits
Pattern = Left(SourceNum, A - 1) & "*" & Mid(SourceNum, A + 1)
Pattern = Left(Pattern, B - 1) & "*" & Mid(Pattern, B + 1)
myList.Add x, Pattern
Next x
' Now empty the collection for the next pass
For x = 1 To myList.Count
myList.Remove 1
Next x
z = z + 1
Debug.Print "Pass " & z & " finished."
Next B
Next A
T2 = Now
MsgBox "Finished! Started at " & T1 & ", finished at " & T2 & "."

End
FlagRow:
If Len(Range("B" & x).Value) = 0 Then
Range("B" & x).Value = "Within 2 digits of row " & myList(Pattern)
Else
Range("B" & x).Value = Range("B" & x).Value & ", " & myList(Pattern)
End If
Resume Next

End Sub


 
For x = 1 To 65000

i have been trying to replace this but can not seem to get it right, keep getting errors,
this one is very quick, under 10 minutes for 65000 lines

 
Just add this code towards the top of the sub:
Code:
Dim LastRow As Integer
LastRow = Sheets("Sheet1").Cells.Find(what:="*", after:=Range("IV65536"), _
    searchorder:=xlByRows, searchdirection:=xlPrevious).Row
and then change the "For" line to read:
Code:
For x = 1 to LastRow

That should do it!

VBAjedi [swords]
 
well i can tell you i did not come close on that one. i was taking a different approach to that, thank you so much JEDI.
I hope I have not used up all my questions on this subject, but as I use the one with the threashhold, it does not work on #'s that are around this long?

7777777772049900126013 the 2 and 1 towards end highlighted
7777777772049900141823 the 4 and 2 towards end highlighted

I was trying to show this off and it performs great on smaller sets #'s. The example above it made the
77777777720499001<bold>2</bold>60<bold>1</bold>3
and
77777777720499001<bold>4</bold>18<bold>2</bold>3

so wondering will this logic not work as well on the larger #'s?
thanks again Jedi and Zathras

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top