Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Search Text Files for Multiple Keywords 2

Status
Not open for further replies.

johnpayback

IS-IT--Management
Oct 2, 2006
110
US
I would like to know how I can search a text file with multiple keywords. I would like it to find lines in the text file that contain all of the keywords. So, an all or nothing. If you know how to make vbscript do this please let me know. An example would be great. I've been trying it using the grep command kind of like below but it only pulls the last keyword rather than lines with all keywords. I know GREP is not the way to go but for really large files it seemed to be much quicker.

Code:
dim str1, str2, str3, str4, str5, str6
dim keyword1, keyword2, keyword3, keyword4, keyword5, keyword6

str1 = "grep -i " & Chr(34) keyword1 Chr(34)
str2 = " | grep -i " & Chr(34) keyword2 Chr(34)
str3 = " | grep -i " & Chr(34) keyword3 Chr(34)
str4 = " | grep -i " & Chr(34) keyword4 Chr(34)
str5 = " | grep -i " & Chr(34) keyword5 Chr(34)
str6 = " | grep -i " & Chr(34) keyword6 Chr(34)

strgrep = str1 & str2 & str3 & str4 & str5 & str6

FolderToSearch = "C:\pathto\folderto\filename.txt"

strcmd = "%comspec% /Q /C " & strgrep & " " & FolderToSearch

set oexec=oWSH.exec(strcmd)
do while not oexec.StdOut.AtEndOfStream
   thing = oexec.StdOut.ReadAll
   strPResult = Replace(thing,vblf,"Chr(13)")
   wscript.echo "Chr(13)" & strPResult
   wscript.flush
loop

The example above works except that it only returns the lines with the last keyword entered. Also...I didn't put it above but I do ask for user input for the keywords.

I do not believe grep is the best way to do this but for really large files it seems to work very fast whereas when I tried regexp it would take up to 10 minutes to pull the data.

JP
 
If you copied and pasted your code in here, you're missing some string concatenation with each of your str variables.
Code:
str1 = "grep -i " & Chr(34) [b][red]&[/red][/b] keyword1 [b][red]&[/red][/b] Chr(34)

Lee
 
Oh...I typed it in and left the & out. My bad. I do have it like you have it. Sorry about that.

JP
 
Okay...new code. How can I make something like this work? How do I get this to display all lines of the text file that match the pattern? Is there any way to have it match multiple keywords rather than just the one? Where "error" is in the code below...is there a way to have it match and return the lines that contain multiple keywords if the line contains all of the keywords that match? Let me know if my questions do not make sense.

Code:
Const ForReading = 1

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\output.txt")

strContents = objFile.ReadAll

objFile.Close

Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.IgnoreCase = True
objRegEx.Global = True
objRegEx.Pattern = "Error"

Set colMatches = objRegEx.Execute(strContents)  
    
For Each Match in colMatches
    strReturnStr = "Match found at position "
    strReturnStr = strReturnStr & match.FirstIndex & ". Match Value is '"
    StrReturnStr = strReturnStr & match.value & "'." & "<BR>" & VBCrLf
    WScript.Echo(strReturnStr)
Next

JP
 
Did you copy and paste your code in, or type it in again? It's always best to copy and paste so we see exactly what the computer is working with.

Lee
 
Have to avoid using grep utility in general except very simple and straightforword situation as a handy tool.

>How can I make something like this work?
This is how.

[1] Suppose the file C:\output.txt contains this.
[tt][green]
'c:\output.txt

La peur et le courage de vivre et de mourir
La mort si difficile et si facile

Hommes pour qui ce trésor fut chanté
Hommes pour qui ce trésor fut gâché

Hommes réels pour qui le désespoir
Alimente le feu dévorant de l'espoir
Ouvrons ensemble le dernier bourgeon de l'avenir

Parias la mort la terre et la hideur
De nos ennemis ont la couleur
Monotone de notre nuit
Nous en aurons raison. [/green]
[/tt]
[2] Then the script should look like this for specially selected keyword set and unicode text file. (If the file is in ascii or system default, modify the parameter highlighted to 0 or -1 accordingly. The example would be in unicode but using system default and it demo a delicate detail in matching as well.)
[tt]
sfile="c:\output.txt"
set fso=createobject("scripting.filesystemobject")
on error resume next
s=fso.opentextfile(sfile,1,true,[highlight]-1[/highlight]).readall 'may have to use 0 for ascii file
if err.number<>0 then s=""
on error goto 0
set fso=nothing

akeyword=array("hommes","tr\xe9sor") '\xe9 for é
set rx=new regexp
with rx
.global=true
.multiline=true
.ignorecase=true
.pattern="^.*?"
for i=0 to ubound(akeyword)
.pattern=.pattern & akeyword(i) & ".*?"
next
.pattern=.pattern & "$"
end with

set cm=rx.execute(s)
'this show the succesful results
for each m in cm
wscript.echo m & vbcrlf & escape(m)
next
[/tt]
How to store the match is anecdotal to the functionality. The above just echo out.
 
Further note:
One may use a more concise one-liner trick to build the pattern; it is just a device, more classic way is not less efficient.
[tt] .pattern="^.*?" & join(akeyword,".*?") & ".*?$"[/tt]
 
tsuji, very nice example of text file search for pattern! I have to award you a star, I have seen many posts but not many of them take time to post such consice and easily read code.



Thanks

John Fuhrman
Titan Global Services
 
Wow, after Georges Brassens, now Paul Eluard !
tsuji, are you by chance "francophile" ?
 
Thanks, John!

I have to add a note to the approach. It searches for an order-set reflecting the index in the array of keyword. If it is simply appearance without order, the pattern would be ubound() factorial number of permutation each joint by the pipe "|" (or). That would quickly become urgly. In that case, successive regex matching with every regex embedding a single keyword would even be desirable.

PHV, let me think... I guess quite so, yes.
 
This works perfectly. The biggest issue that I had with regexp was with large files. It was very slow in didn't seem very efficient but after trying the other way (Grep) I found out quickly that running a little slower is sometimes the better way to go. Another star for you tsuji.


Is eram valde benevolens quod ostendo quare is forum est maioribus of totus forums quod quare tek tips est optimus tech forum in net. Gratias ago vos.


;-) JP
 
tsuji, I'm trying to use it in ASP so after I make the changes necessary for this such as Server. and response. I am getting the following error. This the same thing that I ran into before I experienced with Grep and it seems as though these 1GB+ text files/log files are too large for something like this to run on. Below is the error that I see. Could this be a timeout issue?

Code:
Error Type:
Active Server Pages, ASP 0115 (0x80004005)
A trappable error (E06D7363) occurred in an external object. The script cannot continue running.
/basepage/searchresult.asp

JP
 
Okay...just to test my theory about the file being too large I used a much smaller text file and I get this error below. What you are seeing in the string is the first line of the text file I am trying to run the script against. Any ideas? I know I threw some ASP into this but I didn't want to make it complicated before.

Code:
Error Type:
Microsoft VBScript runtime (0x800A01A8)
Object required: '[string: "2006/08/22 Tue 14:03"]'
/basepage/searchresult.asp, line 47

JP
 
JP, if the file is that big, you've better do it, instead of .readall, line by line using readline (and multiline setting is no longer needed). Dealing with file that big is always worrying, should it be a consequence of less than desirable design in the first place?

As to the smaller test set, your error seems to suggest some kind of typo or wrong use of method on a wrong object. Maybe showing some relevant excerpt so that members can take a look of it?
 
tsuji, I am actually trying to use the ReadByte method like below which is very fast 0.28 seconds for a 1MB file which when compared to the readall method at 73.56 seconds for a 1MB file. It should only take it a little over 4 minutes to read a 1GB file but it is still getting the trappable error. So you think line by line would be best and keep me from getting this error? See below on what I've come to which works great for up to 300MB files but anything larger it gets the trappable error.

The s1-s6 variables are my keywords taken in from an ASP as user input. The rest is your code and then my code to using the ReadByte method. Works very well but still cannot handle really large files. The files I am reading are extremely large text files. This is the reason that I was trying to make the grep command work as nothing in vbscript could handle this. I am willing to try any suggestions. Thank you for taking the time to help me on this.

Code:
    s1=request("texttosearch_1")
	s2=request("texttosearch_2")
	s3=request("texttosearch_3")
	s4=request("texttosearch_4")
	s5=request("texttosearch_5")
	s6=request("texttosearch_6")
    
FolderToSearch = "\myfolder\mypath\bp2000.log"

set objFile = fso.GetFile(Server.MapPath(FolderToSearch))
set objTS = objFile.OpenAsTextStream(ForReading, TristateFalse)

s = objTS.Read(objFile.Size)

'On Error Resume Next
If err.number<>0 then s=""
	on Error GoTo 0
	set fso=Nothing
	akeyword=array(s1,s2,s3,s4,s5,s6)
	set rx=new regexp
	with rx
    	.global=True
	    .multiline=True
	    .ignorecase=True
	    .pattern="^.*?"
    	for i=0 to UBound(akeyword)
        	.pattern=.pattern & akeyword(i) & ".*?"
	    Next
    		.pattern=.pattern & "$"
	end with

	Set cm=rx.execute(s)
	'this show the succesful results
	for Each m in cm
        Response.Write m & "<br><hr>"		
	Next

Thanks,

JP
 
tsuji, is there a way to make this find the lines with the keywords no matter what order they are searched? Right now if I have the line below it works as long as the search words are in the order that they appear in the line.

keywords searched in order are: how cow

How now brown cow.

If I do the above it works perfectly.

keywords searched in order are: cow how

If I do the above it does not return anything.

Please let me know if that makes sense and if there is a way around it.

JP
 
tsuji, maybe if I make it so the user selects the date in which I can then make a new text file based on the date. Then I will let them search the new text file using the code above. This is the only way I can think to search 1GB+ text files. These are really large log files is why they are so large. Nothing that I can do about that right now so I think what I've described above will be what I try unless you have another suggestion.

JP
 
Okay...now I just get a simple "Out of Memory" error. This is because the file is so large not even the Read Bytes method will work. This was why I was attempting to use Grep. I have yet to try splitting the file by having my users select a date from a calendar but I think it is my next attempt to make this work. By the way, the code in this script works very well for >= 100MB files. Anything larger and it just fizzles out. If you use the Read Bytes method to read the file into your array rather than ReadAll it can do >= 500MB files. Anything over that and it fizzles out as well. Ok..I'm off to see what I can come up with to parse these 1GB+ files out. I'll let you know if I come up with something. If by chance you come up with a way to get around the "Out of Memory" problem please let me know. I think you mentioned reading line by line but I'm not sure how to do that without running into the same problem? Again...great job and thanks.

JP
 
Perhaps you could read one line at a time, rather than the whole thing? I wrote a script that reads a proxy log file (which can be huge) and extracts log entries based on 'key' words. I ditched regExp's in favor of a simple instr and OR logic (only checking one line of text at a time and it seemed to speed things up considerably).

The basic logic is:

while not eof
Read a line
If it contains a keyword
Write that line to another file (or, better, an array)
Else
...
End if
Loop

That is the really simple logic. To speed things up, instead of writing the good ones into a file directly, I put them into an array.

Putting the good ones in an array will give you lots of flexibilty in what the output looks like. I made my output a series of web pages.

Hope this helps



strebor
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top