Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations biv343 on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

occurence of substrings in another string

Status
Not open for further replies.

davikokar

Technical User
May 13, 2004
523
IT
hallo,
i have to find the longest substring of one string into another string. The substrings must all start with the first character, like this:

String to be found: "ABDC", I have to find "A", "AB", "ABC", "ABCD"

String to be searched:
ACDBDDDBCAB

--> in this case the output should be "AB", because is the longest occurence.

Does anyone has a suggestion on how to proceed or have some code that does something similar? thanks for help
 
karerda,
If string and substring are relatively short you could use a function based on [tt]InStr()[/tt].
Code:
Public Function LongestSubstring(StringToSearch As String, SubString As String) As String
  Dim intLength As Integer, intStart As Integer
  Dim strSub As String
  For intLength = Len(SubString) To 1 Step -1
    strSub = Left$(SubString, intLength)
    intStart = InStr(1, StringToSearch, strSub)
    If intStart <> 0 Then
      LongestSubstring = Mid$(StringToSearch, intStart, intLength)
      Exit Function
    End If
  Next intLength
End Function

Hope this helps,
CMP

[small]For the best results do what I'm thinking, not what I'm saying.[/small]
(GMT-07:00) Mountain Time (US & Canada)
 
There are a few ways to do this. The first way I'll show you is the easiest and involves looping through the string:
Code:
Private Sub Command1_Click()
MsgBox FindLongestString(Text1.Text, "ABCD")
End Sub

Private Function FindLongestString(X As String, strFind As String) As String
Dim strBestSoFar                As String
Dim strCurrentMatch             As String
Dim I                           As Long
Dim K()                         As String

ReDim K(Len(X) - 1)

For I = 0 To Len(X) - 1
   K(I) = Mid(X, I + 1, 1)
Next I

For I = LBound(K) To UBound(K)
    strCurrentMatch = IIf(InStr(1, strFind, strCurrentMatch & K(I)) = 1, strCurrentMatch & K(I), vbNullString)
    If Len(strCurrentMatch) > Len(strBestSoFar) Then
    strBestSoFar = strCurrentMatch
    If strBestSoFar = strFind Then Exit For
    End If
Next

FindLongestString = strBestSoFar

End Function
It puts the code into an array and then loops the array checking to see if it's found a matching string. If it has it tries to add the next character and if that doesn't match the find string it will blank it and keep trying. Hope that makes sense, if you need any more explaining just post back.

Another way would be to use Regular Expressions. There are several examples on the web and in this forum if you want to go down that route. It's worth a look at they're incredibly useful.

Hope this helps

HarleyQuinn
---------------------------------
The most overlooked advantage to owning a computer is that if they foul up there's no law against wacking them around a little. - Joe Martin

Get the most out of Tek-Tips, read FAQ222-2244 before posting.
 
A regular expression would probably be the best solution, but how to write the Pattern for this instance.

Everybody is somebodys Nutter.
 
This is one way to do it with RegExp - note to all - I am pretty poor at RegExp so there is probably a way to do this within 1 regexp statement but I can't figure it out and this works pretty quick so....
Code:
Function Find_Longest_String(rng As Range)
Dim RgExp As Variant

Set RgExp = CreateObject("VBScript.RegExp")

Find_Longest_String = "No Match"

For i = 1 To 4

With RgExp

    Select Case i
        Case 1
            .Pattern = "[A][B][C][D]"
        Case 2
            .Pattern = "[A][B][C]"
        Case 3
            .Pattern = "[A][B]"
        Case 4
            .Pattern = "[A]"
    End Select

    If .test(rng) = True Then
        Find_Longest_String = Replace(Replace(.Pattern, "[", ""), "]", "")
        Exit Function
    End If
    
End With
    
        
Next i
End Function

Rgds, Geoff

We could learn a lot from crayons. Some are sharp, some are pretty and some are dull. Some have weird names and all are different colours but they all live in the same box.

Please read FAQ222-2244 before you ask a question
 
Geoff - I'll be honest, I'd been trying to come up with a one-liner pattern but the closest I'd got was something pretty similar (well exactly the same in idea, albeit slightly different in execution) to what you've got.

Cheers

HarleyQuinn
---------------------------------
The most overlooked advantage to owning a computer is that if they foul up there's no law against wacking them around a little. - Joe Martin

Get the most out of Tek-Tips, read FAQ222-2244 before posting.
 
Must be a toughie if you're having to think about it! [wink]

HarleyQuinn
---------------------------------
The most overlooked advantage to owning a computer is that if they foul up there's no law against wacking them around a little. - Joe Martin

Get the most out of Tek-Tips, read FAQ222-2244 before posting.
 
I'm thinking it may not be possible in a 1 liner as RegExp is testing for a matching pattern and just returns true if it matches. There are 4 patterns to test for, in order, so I'm thinking there does actually need to be 4 seperate tests......Please oorrect me if I'm wrong but I don't think that RegExp can produce a distinction between which particular part of a pattern matches...

Rgds, Geoff

We could learn a lot from crayons. Some are sharp, some are pretty and some are dull. Some have weird names and all are different colours but they all live in the same box.

Please read FAQ222-2244 before you ask a question
 
Something along these lines:
Code:
[blue]    Dim re As RegExp

    
    Set re = New RegExp
    
    re.Global = True
    re.Pattern = "A((BCD)|(BC)|(B))"
    
    Debug.Print re.Execute(Text1.Text).Item(0)[/blue]
 
strongm - nearly - but it finds the 1st element that matches. So, in string

ABACABCFFFF

AB will be returned rather than ABC

Rgds, Geoff

We could learn a lot from crayons. Some are sharp, some are pretty and some are dull. Some have weird names and all are different colours but they all live in the same box.

Please read FAQ222-2244 before you ask a question
 
I like CMP's approach. It's simple and dynamic. Here's a slightly shorter version. :)
Code:
Public Function Longest_Substring(target As String, search As String) As String
   Dim i As Integer
   
   For i = Len(search) To 1 Step -1
      search = Left(search, i)

      If InStr(1, target, search) <> 0 Then
         Longest_Substring = search
         Exit Function
      End If
   Next i
End Function
 
Well - it does get the match ... it's just a question of getting the match out properly. Of course, doing that makes it a bit longer ...
Code:
[blue]Dim re As RegExp
    Dim wombat As Match
    Dim maxstring As String
    Set re = New RegExp
    re.Global = True
    re.Pattern = "(ABCD)|(ABC)|(AB)|(A)" ' slightly reworked from initial idea
    For Each wombat In re.Execute(Text1.Text)
        If wombat.Length > Len(maxstring) Then maxstring = wombat.Value
    Next
    Debug.Print maxstring[/blue]
 
My god, that was pretty much the pattern I tried [sad]

WinBlowsMe - I still think the array and IIF is a faster way to get it without RegExp...

HarleyQuinn
---------------------------------
The most overlooked advantage to owning a computer is that if they foul up there's no law against wacking them around a little. - Joe Martin

Get the most out of Tek-Tips, read FAQ222-2244 before posting.
 
Yep. Whilst on the surface it looks like a Regular Expression might be the way to go, the reality looks to be otherwise.

I may give this a little more thought, but I don't hold out much hope
 
I'm still liking the wombat Mr Strong... [smile]

Geoff, using the matches collection you can go through any matches that the pattern finds.

HarleyQuinn
---------------------------------
The most overlooked advantage to owning a computer is that if they foul up there's no law against wacking them around a little. - Joe Martin

Get the most out of Tek-Tips, read FAQ222-2244 before posting.
 
A quick not on Regular Expressions. It's incredibly fast when you have a large amount of text or need to find multiple occurances of a substring but the overhead to create the object and the fact that you need to write a UDF just to create the pattern...

Since we've completely highjacked the thread I thought it would be fun to demonstrate with a little speed testing.

[tt]Cycle = 1/2,992,540,000 sec
Testing String: ACDBDDDBCAB against Substring: ABDC

WinblowsME: Longest_Substring
-> 7.81 cycles
CautionMP: LongestSubstring
-> 8.79 cycles
HarleyQuinn: FindLongestString
-> 79.70 cycles
xlbo: Find_Longest_String
-> 2,779.73 cycles[/tt]
Code:
Declare Function QueryPerformanceCounter Lib "Kernel32" (X As Currency) As Boolean
Declare Function QueryPerformanceFrequency Lib "Kernel32" (X As Currency) As Boolean
Const cString As String = "ACDBDDDBCAB"
Const cSub As String = "ABDC"

Sub TimeThem()
  Dim Ctr1 As Currency, Ctr2 As Currency, Freq As Currency
  Dim intCounter As Integer
  Dim strOut As String
  
  If QueryPerformanceCounter(Ctr1) Then
    QueryPerformanceCounter Ctr2
    QueryPerformanceFrequency Freq
    Debug.Print "Cycle = 1/" & _
                Format(Freq * 10000, "#,###"); " sec"
  Else
    Debug.Print "High-resolution counter not supported."
    Exit Sub
  End If
  
  Debug.Print "Testing String: "; cString, " against Substring: "; cSub
  Debug.Print
  
  'Longest_Substring
  Ctr1 = 0: Ctr2 = 0: strOut = ""
  QueryPerformanceCounter Ctr1
  For intCounter = 1 To 10
    strOut = Longest_Substring(cString, cSub)
  Next intCounter
  QueryPerformanceCounter Ctr2
  Debug.Print "WinblowsME: Longest_Substring"
  Debug.Print "-> "; Format$(Ctr2 - Ctr1, "#,###.00"); " cycles"
  
  'LongestSubstring
  Ctr1 = 0: Ctr2 = 0: strOut = ""
  QueryPerformanceCounter Ctr1
  For intCounter = 1 To 10
    strOut = LongestSubstring(cString, cSub)
  Next intCounter
  QueryPerformanceCounter Ctr2
  Debug.Print "CautionMP: LongestSubstring"
  Debug.Print "-> "; Format$(Ctr2 - Ctr1, "#,###.00"); " cycles"
  
  'FindLongestString
  Ctr1 = 0: Ctr2 = 0: strOut = ""
  QueryPerformanceCounter Ctr1
  For intCounter = 1 To 10
    strOut = FindLongestString(cString, cSub)
  Next intCounter
  QueryPerformanceCounter Ctr2
  Debug.Print "HarleyQuinn: FindLongestString"
  Debug.Print "-> "; Format$(Ctr2 - Ctr1, "#,###.00"); " cycles"
  
  'Find_Longest_String
  Ctr1 = 0: Ctr2 = 0: strOut = ""
  QueryPerformanceCounter Ctr1
  For intCounter = 1 To 10
    strOut = Find_Longest_String(cString)
  Next intCounter
  QueryPerformanceCounter Ctr2
  Debug.Print "xlbo: Find_Longest_String"
  Debug.Print "-> "; Format$(Ctr2 - Ctr1, "#,###.00"); " cycles"
End Sub

CMP
 
[lol] I suck !

Rgds, Geoff

We could learn a lot from crayons. Some are sharp, some are pretty and some are dull. Some have weird names and all are different colours but they all live in the same box.

Please read FAQ222-2244 before you ask a question
 
xlbo said:
Don't forget, you used Late Binding, creating a new instance in each iteration. If the code was optimized, it would do better. It's not a fair interpretation. :)

Regards,
Zack Barresse

Simplicity is the ultimate sophistication. What is a MS MVP? PODA
- Leonardo da Vinci
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top