Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Searching strings, other method apart from InStr?

Status
Not open for further replies.

Rish

Programmer
Jul 14, 2001
12
0
0
US
I have two strings, and I am comparing them to see if one strings' contents exist in the other:

e.g.
Dim A as string
Dim B as string

A$ = "hello"
B$ = "lo"

then using: Instr(A$, B$)

I am (in my project) assigning large amounts of data to the strings.
Is there any other method, i.e. FASTER to search?
 
From your example I gather that you are not using vbTextcompare which really slows things down e.g.
lngPos = Instr(A,B,vbTextcompare)

If you do not care about WHERE the string is located, you can try the LIKE opeartor.
blnThere = A LIKE "*" & B "*"

The process of concatenating "*" though may make it worse.
 
what i'm actually doing is this:

i have a EXE file located: C:\FILE.EXE

I have opened this file etc. etc. and assigned it's data to a string called FileData.

Now my search criteria is stored as Search$.

FileData is going to hold a lot of data from the EXE file. That is the dilemma in further detail.

I'll try the LIKE operator as you have suggested.
 
I tried the following:

FileData$ Like "*" & Search$ & "*"

It worked but after time analysis using LIKE has not made much difference, if anything.

Thanks for that suggestion John.

Any other suggestions?

Thanks again.
 
Nope,
That is as fast as it gets, except for a faster CPU.
If you a searching for multiple strings, you could try byte by byte and check for each string. Not likely faster though.
 
I am searching for multiple strings, I currently have an array which holds all the search criteria. I then search through the FileData$ searching for each string in turn (from the array) and if a match is found stop the current process otherwise carry on to the next string in the array. The process where I said "searching for each string in turn" is causing the speed difficulty.

Do you know any otehr methods to optimize my method above? Another way of going about it?

Also could you please explain the following point further:

"If you a searching for multiple strings, you could try byte by byte and check for each string."

Again, many thanks.
 
Byte by Byte is
Code:
Dim I as long
Dim L as long
Dim N as long
Dim N as long
Dim blnFound as boolean
L = Len(FileData)
M = Ubound(aryStr)
For I = 1 to L
    For N = 0 To M
        if L - I  < Len(aryStr(N)) then
        elseIf Mid$(FileData,I,Len(aryStr(N)) = aryStr(N) then
            blnFound = true
            Exit For ' If quit on any Match
        End if
    Next
    if blnFound = true then exit for
Next
 
I am a bit curious. If I may ask a question or so?

Is this a 'regular' exercise? On that will be executed on a regular (repetive) basis? Is the executable being searched the same for every execution?

What is the point or purpose of finding specific strings in an executable?

Are the strings being searched for the same for each run of the procedure? What are the lengths of the strings? How many are there? Are the strings being searched for always low ASCII (e.g. printable) characters and the space, cr, lf, and tab characters? Are your strings &quot;whole words&quot; or - as in your example embedded substrings (parts of words)?

e.g. looking for 'lo' in 'hello' or looking for &quot;Hello&quot; in &quot; Hello World&quot;.

MichaelRed
mred@att.net

There is never time to do it right but there is always time to do it over
 
John, thanks for that, after a bit of tweaking that method is in fact slower than using InStr. Thanks anyways, appreciate it.

Well there is about 2500+ different strings. The executable is opened, and each string in turn is searched for in the executable file. The strings are not whole words like &quot;hello&quot;, the strings contain binary information.

The program I am making detects Trojans. If you are a member of AOL you will be aware that there are Trojans that can be sent to you which steal your account password. I have made programs in the past (upto 2yrs ago) which detect Trojans such as these (and distributed them as &quot;freeware&quot;) but I am making a new scanner and treating it as a &quot;proper&quot; project since I found out that a friend of mine who was also into this made over $6000 selling his program!

I have now change my scanning mechanism so before instead of scanning the file with the database of 2500+ I made a proc which decide if this file needs to be scanned with (for example) strings 1 to 500. So in some cases a particular file is only scanned with 200 strings rather than 2500+ strings -> increasing speed of the scanning.

Now since I've got the core scanning almost complete, to the design board for the rest of the project!

Does that answer your question michael?
 
Yes, but it also means that my few ideas on helping you are not applicable. The only other approach I know of is to 'index' the target file for all possible strings (in the file, NOT you 'findd strings'. Then you could just check eack of the find strings aginst the indicies. The indexing would be somewhat time-consuming, but the actual checking after the indexing would be very fast - compared to the search process. It is possible to limit the indexing to the length of the longest string you would search for. That [b[could[/b] save some processing time, but this is not really clear.

I am not familiar with AOL / Trojan virus which you mention. It has been a few years since I really looked at viral code, so I'm sure it has gone well beyond my knowledge.


MichaelRed
mred@att.net

There is never time to do it right but there is always time to do it over
 
In that case construct two arrays, one with each character in exe and a corresponding one with 0 thru N-1 where N is the # of bytes. Sort the two arrays as major and minor on character and starting position. Construct another array with Ubound(255). In the third array put the relative location of the first occurrecnce of the character in ary1 corresponding to the index in the third array i.e. ary3(0) = 1st entry in ary1 with 0, ary3(1) = 1st entry in ary1 with 1. Now for each string, you can use the binary equivalent as an index to ary3 to point to ary1 whose corresponding ary2 points to a byte in the exe.
Code:
'ary1()  binary character array - all 00,all 01 all 02 etc
'ary2()  long index into exe of correspond ary1 binary
'ary3(255) - ary3(0) points to 1st 01 in ary2, 
'           ary(1) points to 1st 01 in ary2( 0 if none)    
For I = 0 to N - 1 ' For Each string
    ' Get 1st character of string
    J = Asc(Left(string(I),1))
    K = ary3(J)    ' get index into ary1-2
    ' If character does not occur in EXE then index is 0
    Do while(ary1(K) = J) ' as long as ary1(K) = character
        ' ary2(K) has 0-relative start position in exe
        if string = Mid(Exe,ary2(K)+1,Len(string)) then
             .... got it
        End if
        K = K + 1
    Loop
Next
Of course you have to &quot;tweak&quot; it for however you are handling binary. This algorithm lets you avoid all strings in the exe that do not begin with the same character as the string you are looking for.
 
I think I know what you are getting at. Here is how I have amended the code so far:

Dim Ary1()
Dim Ary2()
ReDim Ary1(0)
ReDim Ary2(0)
Dim Data As String
Dim Ary3(255)
Dim Found As Boolean
Found = False

X = 0 'X will hold number of bytes in file

Num = 0 'Used solely in the for loop below

For Z = 1 To 255
Num = Num + 1
Ary3(Z) = Num
Next

Open &quot;c:\1.txt&quot; For Binary As #1
X = LOF(1)
For J = 1 To X

start_pos = J
stop_pos = J + 1

Data = Space(stop_pos - start_pos + 1)
Get #1, start_pos, Data

ReDim Preserve Ary1(0 To UBound(Ary1) + 1)
Ary1(J) = Data

ReDim Preserve Ary2(0 To UBound(Ary2) + 1)
Ary2(J) = J - 1
Next J

Close #1

Search$ = &quot;hello&quot;

For I = 0 To X - 1 ' For Each string
' Get 1st character of string
J = Asc(Left(Search$, 1))

K = Ary3(J) ' get index into ary1-2

' If character does not occur in EXE then index is 0
Do While (Ary1(K) = J) ' as long as ary1(K) = character
' ary2(K) has 0-relative start position in exe
If Search$ = Mid(Exe, Ary2(K) + 1, Len(Search$)) Then
Found = True
MsgBox &quot;found&quot;
Exit Do
End If
K = K + 1
Loop
If Found = True Then Exit For
Next
MsgBox &quot;not found&quot;


I changed it just for ONE search.

What is the EXE in Mid(Exe.... for? What should be assigned to the variable Exe?

Is what I have done correct so far?

Thanks.
 
I can do it on pseudo code
Code:
Dim ary1() as integer
dim ary2() as long
For each I byte starting at 0 in file
    ' save exe so it can be addressed with Mid or MidB below
    ary1(I) = byte I from file
    ary2(I) = I
    redim as necessary
Next
Sort ary1, ary2 ' sort ary1 and ary2 on ary1, ary2
Redim ary3(255) as integer
Dim bln0Is0 as boolean
I - 0
if ary1(0) = 0 then I = 1 ' skip setting ary3(0)
For I = I to Ubound(ary1)
    J = ary1(I)
    if ary3(J) = 0 then   ' 1st J in ary2   
       ary3(J) = ary2(I)  ' index to ary1 character J
    End if
Next
For I = 0 to Ubound(string) ' For Each string
    ' Get 1st character of string
    J = Asc(Left(string(I),1)) ' 1st character from string
    K = ary3(J)    ' get index into ary1-2
    ' If character does not occur in EXE then index is 0
    Do while(ary1(K) = J) ' as long as ary1(K) = character
        ' ary2(K) has 0-relative start position in exe
        if string = MidB(Exe,ary2(K)+1,Len(string)) then
             .... got it
        End if
        K = K + 1
    Loop
Next

     01234567 
EXE=&quot;CCEEFABB&quot;
ary3        ary1       ary2       EXE
Pos Value   Pos Value  Pos Value  Pos Value
&quot;A&quot;   0      0   &quot;A&quot;    0   5      0   &quot;C&quot;
&quot;B&quot;   1      1   &quot;B&quot;    1   6      1   &quot;C&quot;
&quot;C&quot;   3      2   &quot;B&quot;    2   7      2   &quot;E&quot;
&quot;D&quot;   0      3   &quot;C&quot;    3   0      3   &quot;E&quot;
&quot;E&quot;   5      4   &quot;C&quot;    4   1      4   &quot;F&quot;
&quot;F&quot;   7      5   &quot;E&quot;    5   2      5   &quot;A&quot;
             6   &quot;E&quot;    6   3      6   &quot;B&quot;
             7   &quot;F&quot;    7   4      7   &quot;B&quot;
Look for &quot;EF&quot;
J = Asc(&quot;E) 
For
    K = Ary3(J) ' 5
    ary1(5) = &quot;E&quot;
    ary2(5) = 2
    MidB(aryEXE,2,2) = &quot;EE&quot;
    K = K + 1 6
     
    ary1(6) = &quot;E&quot;
    ary2(6) = 3
    MidB(aryEXE,3,2) = &quot;EF&quot;


 
So does there have to be another array to store all the EXE data (in no particular order) ?

Anyone know of a sorting algorithm I can use to the specification of above?

Thanks.
 
Right, I've tried your method John but unfortunately it takes a very long time to load the executable file into the Array so it is not appropriate.

A very nice idea though.

Does anyone else have any suggestions or other methods of trying to solve my problem?

Thanks in advance
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top