Matching Alternative Patterns slows down search tremendously! 1

cptk · Sep 16, 2009

Using something like ...
@matched = grep /ABC/g, <FILE>;
On a single 16,000 line txt file takes only 0.7 seconds.

but, if I add ...
@matched = grep /ABC|XYZ/g, <FILE>;
On a single 16,000 line txt file takes 17.0 seconds.

Why would adding a alternate pattern sequence cause such an increase in execution time in perl?

Funny, though that when I test with and without alternate patterns using egrep, there's no difference in execution time - takes approx. 0.58 seconds for either one !!!
egrep "ABC|XYZ" file

Kirsle · Sep 16, 2009

Try adding "o" to the modifiers.

Code:

@matched = grep /ABC|XYZ/og, <FILE>;

o makes the regexp compile only one time, since there are no variables in the regexp it only needs to compile once. It might speed things up.

Cuvou.com | My personal homepage

Code:

perl -e '$|=$i=1;print" oo\n<|>\n_|_";x:sleep$|;print"\b",$i++%2?"/":"_";goto x;'

Annihilannic · Sep 16, 2009

Out of curiosity, try comparing fgrep, grep and egrep...

Annihilannic.

cptk · Sep 17, 2009

I've already looked into the "o" modifier - has no improvement impact.

grep - the POSIX version using -E yielded the same favorable times as egrep.
fgrep - doesn't support reg. expressions.

... my search for an explaination continues ...

Annihilannic · Sep 17, 2009

I meant without the alternation... since you mentioned that you tried that. grep -E is usually exactly equivalent to egrep (the latter often just being a symlink).

I guess my point is that perl is obviously using a different code path when the expression contains any kind of extended regular expression. Perl does it implicitly, whereas grep forces you to make the choice by using -E, -F or neither (for basic regular expressions).

It does seem to be very slow though... it would be quicker for you to do separate greps for ABC and XYZ and join the results together! The only disadvantage there being that the order of the original input data is compromised...

Annihilannic.

KevinADC · Sep 17, 2009

What version of perl and what operating system? There can be a very wide range of difference between versions and operating systems and the slowness you experience may not be typical of all perl users.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

cptk · Sep 18, 2009

Annihilannic -
When I test with and without alternate patterns using both an egrep and a grep, there's no significant difference in execution time - all permutations take approx. the same amount time (i.e. < 3/4 second)

KevinADC -
!!! BINGO !!!!
I'm on Solaris 9, but I was testing some proof-of-concept stuff in an old script which was using perl 5.6.1 .... yikes!!!

when I switched over to 5.10.0 (which I normally use) the execution time took a little over 1 second - a big difference from 17 seconds

Thanks Kevin for pointing that out ...

MikeLacey · Sep 20, 2009

The o switch is only useful when you're using a variable in the regular expression, calling the regex compare multiple times, and (crucially) the value of the variable won't change. Like this

my $var='string to search for';
while(<>){
print if /$var/o;
}

o, in this case, tells Perl to only evaluate the value of $var once and not to bother working out what's in $var each time the while loop operates.

Only useful when using a variable in a regex AND the regex will get called multiple times.

Mike

http://www.myspace.com/micahhowzat

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Matching Alternative Patterns slows down search tremendously! 1

cptk

Technical User

Kirsle

Programmer

Annihilannic

MIS

cptk

Technical User

Annihilannic

MIS

KevinADC

Technical User

cptk

Technical User

MikeLacey

MIS

Similar threads

Part and Inventory Search

Sponsor