Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Having trouble in grep the right string

Status
Not open for further replies.

whn

Programmer
Oct 14, 2007
265
US
I need some help in understanding how to grep the right info in perl.

Here is the requirement. I have some files in a subdir './somedir':
Code:
% ls
abc.5.1.0_b249.tar.gz
abc.5.1.0.sp3_b249.tar.gz
abc.5.1.1_b008.tar.gz
abc.5.2.0_b187.tar.gz
abc.5.2.0.ga_b187.tar.gz
abc.5.2.1_b188.tar.gz
abc.5.2_b186.tar.gz
abc.5.2.sp2_b186.tar.gz
abc.5.3.0_b161.tar.gz
abc.5.3.0.ga_b161.tar.gz
abc.5.3.1_b141.tar.gz
abc.5.3_b170.tar.gz
abc.5.3.sp1_b170.tar.gz
efg
lmn
The files I want to retrieve are:
Code:
% ls abc*5.?_b*
abc.5.2_b186.tar.gz
abc.5.3_b170.tar.gz

I wrote a piece of perl code to do this job:

Code:
my $dir = './dir4RdDir/';
opendir(DIR, $dir);
[COLOR=blue]#my @files = sort(grep (/\d\.\d_b/ && !/\d\.\d\.[0-9]_b/, readdir(DIR)));[/color]
[COLOR=red]
#my @files = sort(grep (/\d\.\d[b]\[/b]_b/, readdir(DIR)));
my @files = sort(grep (/\d\.\d_b/, readdir(DIR)));
[/color]
my $i = 1;
foreach (@files) {
  print "$i, \$_ = $_\n";
  $i++;
}

I know the line in blue works. But this part ‘!/\d\.\d\.[0-9]_b’ looks redundant to me. However, w/o that part, neither line in RED works as I expected!!

Here is the output while the line in red is used:

1, $_ = abc.5.1.0_b249.tar.gz
2, $_ = abc.5.1.1_b008.tar.gz
3, $_ = abc.5.2.0_b187.tar.gz
4, $_ = abc.5.2.1_b188.tar.gz
5, $_ = abc.5.2_b186.tar.gz
6, $_ = abc.5.3.0_b161.tar.gz
7, $_ = abc.5.3.1_b141.tar.gz
8, $_ = abc.5.3_b170.tar.gz

Why would ‘grep (/\d\.\d_b/, readdir(DIR))’ return something with \d\.\d\.\d_b? Why not only return \d\.\d_b? What is the better way to code this?

Many thanks!!
 
Code:
my @files = grep {/\D\.\d\.\d_b/} readdir DIR;

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Thank you, Kevin! That works!!

However, I still don't understand why mine would not work.

In addition, suppose I also have a file named as abc1.5.2_b150.tar.gz, which I want to retrieve together with abc.5.2_b*, too, then your code would not work.

Again, why would \d\.\d_b match \d\.\d.\d_b? It looks like a perl bug to me.
 
My regexp is [red]\D\.\d\.\d_b[/red] and not [red]\d\.\d\.\d_b[/red]

I hope you see the difference. \D (the compliment of \d) is any non digit character so my regexp only matches \.\d\.\d_b if the character before the first dot is a non-digit character. Yours matches if there is anything before the first dot.

To answer your other question:

In addition, suppose I also have a file named as abc1.5.2_b150.tar.gz, which I want to retrieve together with abc.5.2_b*, too, then your code would not work.

You would need a little more complex regexp or use two regexp. Maybe as one:

Code:
/^[a-z]{3}(?:\.\d)?\d\.\d_b/

Maybe as two:

Code:
/^[a-z]{3}\.\d\.\d_b/ || /^[a-z]{3}\.\d\.\d\.\d_b/

Whenever possible its a very good idea to add string anchors. Note I added ^ to the regexp which forces the match to start at the beginning of the string. Not only is this faster it also takes out a lot of the guess work perl has to go through to find a sub string match that can be anywhere in a string.


------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Thank you again, Kevin.

I did notice that your regexp starting with \D - a non-digit char. And I understand everything you said in both of your posts, which helped me a great deal.

However, I still don't understand why my regexp \d\.\d_b would match both *5.3_b* & *5.3.0_b*?

My understanding is somewhat like this:

A regexp \d\.\d_b should only match a pattern with TWO single digit separated by ONE dot, e.g. *5.2_b*, *5.3_b*. And it should NOT match a pattern with THREE single digit separated by TWO dot, e.g. *5.3.0_b*.

Kevin, could you please explain it a bit more? I appreciate your help very much.

 
OK. I will attempt to explain.

A regexp \d\.\d_b should only match a pattern with TWO single digit separated by ONE dot, e.g. *5.2_b*, *5.3_b*. And it should NOT match a pattern with THREE single digit separated by TWO dot, e.g. *5.3.0_b*.

That is wrong (the above). The regexp (/\d\.\d_b/) will match the pattern anywhere in the string. As long as it finds the pattern anywhere in the search string the match is true. There could be a million characters before and a million characters after the search pattern, but as long as perl finds the search pattern somewhere in the string, it becomes a true match.

That explains why it matches. That does not explain what you could do to avoid finding false matches or make your regular expressions more efficient and accurate. But that is a book or at least a long article. [wink]

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top