Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Matching Alternative Patterns slows down search tremendously! 1

Status
Not open for further replies.

cptk

Technical User
Mar 18, 2003
305
US
Using something like ...
@matched = grep /ABC/g, <FILE>;
On a single 16,000 line txt file takes only 0.7 seconds.

but, if I add ...
@matched = grep /ABC|XYZ/g, <FILE>;
On a single 16,000 line txt file takes 17.0 seconds.

Why would adding a alternate pattern sequence cause such an increase in execution time in perl?

Funny, though that when I test with and without alternate patterns using egrep, there's no difference in execution time - takes approx. 0.58 seconds for either one !!!
egrep "ABC|XYZ" file

 
Try adding "o" to the modifiers.

Code:
@matched = grep /ABC|XYZ/og, <FILE>;

o makes the regexp compile only one time, since there are no variables in the regexp it only needs to compile once. It might speed things up.

Cuvou.com | My personal homepage
Code:
perl -e '$|=$i=1;print" oo\n<|>\n_|_";x:sleep$|;print"\b",$i++%2?"/":"_";goto x;'
 
I've already looked into the "o" modifier - has no improvement impact.

grep - the POSIX version using -E yielded the same favorable times as egrep.
fgrep - doesn't support reg. expressions.

... my search for an explaination continues ...
 
I meant without the alternation... since you mentioned that you tried that. grep -E is usually exactly equivalent to egrep (the latter often just being a symlink).

I guess my point is that perl is obviously using a different code path when the expression contains any kind of extended regular expression. Perl does it implicitly, whereas grep forces you to make the choice by using -E, -F or neither (for basic regular expressions).

It does seem to be very slow though... it would be quicker for you to do separate greps for ABC and XYZ and join the results together! The only disadvantage there being that the order of the original input data is compromised...

Annihilannic.
 
What version of perl and what operating system? There can be a very wide range of difference between versions and operating systems and the slowness you experience may not be typical of all perl users.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Annihilannic -
When I test with and without alternate patterns using both an egrep and a grep, there's no significant difference in execution time - all permutations take approx. the same amount time (i.e. < 3/4 second)

KevinADC -
!!! BINGO !!!!
I'm on Solaris 9, but I was testing some proof-of-concept stuff in an old script which was using perl 5.6.1 .... yikes!!!

when I switched over to 5.10.0 (which I normally use) the execution time took a little over 1 second - a big difference from 17 seconds

Thanks Kevin for pointing that out ...
 
The o switch is only useful when you're using a variable in the regular expression, calling the regex compare multiple times, and (crucially) the value of the variable won't change. Like this

my $var='string to search for';
while(<>){
print if /$var/o;
}


o, in this case, tells Perl to only evaluate the value of $var once and not to bother working out what's in $var each time the while loop operates.

Only useful when using a variable in a regex AND the regex will get called multiple times.

Mike

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top