Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need help searching for multiple keywords 1

Status
Not open for further replies.

bigbalbossa

Programmer
Mar 21, 2002
87
US
All,

I have been working on a small project that pulls keywords from a file, matches them to an array of control files, inserts matching records into a table and increments a count. The problem is how i'm matching keywords to the control files. I'm hoping one of you guys can help.

I'm constructing a regex like so:
Code:
my $keyword_re = "(" . join( "|", map(quotemeta, sort(keys(%rap_kwds))) ) . ")";
Then, while looping through my control files, I get my matches:
Code:
while(<FILE2>) {
   chomp;
	
   my @matches = ( $_ =~ /Air: Last Activity/g );
	foreach my $match ( @matches ) {
	print "Match = $match\n";
	}
}
This does a good job of getting matches, but the regex is a little too greedy. For example, if a control file contains Air: Last Activity and Air: Last Activity All on the same line...I match Air: Last Activity twice and not Air: Last Activity All.

Any thoughts on fixing this code or alternative solutions will help keep me sane :)

Thanks,
 
regexes not my strongest suit, if I have one ...
but, you're only asking it to match Air: last Activity, and not asking it to match Air: last Activity all

Does this make sense?
--Paul

It's important in life to always strike a happy medium, so if you see someone with a crystal ball, and a smile on their face ... smack the fecker
 
Nice...I cut the wrong piece of code. This is what it should read:
Code:
my $keyword_re = "(" . join( "|", map(quotemeta, sort(keys(%rap_kwds))) ) . ")";

while(<FILE2>) {
   chomp;
    
   my @matches = ( $_ =~ /$keyword_re/g );
    
    foreach my $match ( @matches ) {
    print "Match = $match\n";
    }
}

 
A lot will depend on the structure of the file you're reading and the terms you're trying to match. Your problem here is that there's an element of overlap between the two terms you've told us about.

You could try sorting your regexp terms by length (i.e. trying to match the longest term first). That would fix this particular problem but whether or not it'll work perfectly depends on the degree of overlap between your terms.
Code:
my $keyword_re = "(" . join( "|", map(quotemeta, sort { length $b <=> length $a } (keys(%rap_kwds))) ) . ")";

As an alternative (though I don't have time to benchmark them to see which is quicker), since you're just matching strings, you could use 'index'.
Code:
while(<FILE2>) {
   chomp;
    
   foreach my $match ( keys %rap_kwds ) {
      print "Match = $match\n" if ( index $_, $match >= 0 );
   }
}
 
Ishnid,

i'm beginning to think you are a prodigy :)

I never thought about sorting, this will help a lot in the future. Thanks for the help...again.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top