Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Can a reg ex do the job?

Status
Not open for further replies.

tonykent

IS-IT--Management
Jun 13, 2002
251
GB
Can anyone come up with any ideas for adressing this issue? I've been asked to produce some figures from my Change Request database, including how many Change Requests went through peer review but subsequently failed testing, and this one has me stumped. I have pulled out the raw data, thus ending up with several thousand lines like these:

Code:
(line 1) in_review assigned IN_peer_review IN_in_test resolved
(line 2) in_review assigned IN_peer_review IN_in_test resolved concluded
(line 3) in_review IN_design_assigned assigned IN_peer_review IN_in_test resolved
(line 4) in_review assigned IN_peer_review IN_in_test resolved
(line 5) entered assigned IN_peer_review IN_in_test assigned IN_peer_review IN_in_test assigned IN_peer_review IN_in_test resolved
(line 6) in_review assigned IN_in_test resolved
(line 7) in_review assigned IN_peer_review assigned IN_peer_review IN_in_test resolved
(line 8) in_review assigned IN_peer_review IN_in_test resolved
(line 9) assigned IN_peer_review IN_in_test resolved

What I need to get is the number of lines like line 5 above, where 'IN_peer_review' appears JUST ONCE before 'IN_in_test' appears TWO or more times. 'IN_peer_review' may appear several times after (i.e. to the right) of any appearance of 'IN_in_test'. These are of no consequence to me.

Thus,

....IN_peer_review.....IN_in_test.....IN_in_test.....

is the pattern I am trying to match. Can a regular expression manage this?
 
This goes part of the way:

Code:
m/^.+IN_peer_review.+IN_in_test.+IN_in_test.+/

But it matches even if IN_peer_review appears more than once before the first IN_in_test, which is not what is wanted. IN_peer_review must only appear once on the left of the first IN_in_test.
 
Try this:

Perl:
[COLOR=#006600]#!/usr/bin/perl -w[/color]
[COLOR=#0000FF]use[/color] strict;

[COLOR=#0000FF]while[/color] (<DATA>) {
        [COLOR=#0000FF]if[/color] ([COLOR=#FF8000]m[/color]/^(.+?IN_peer_review.+?)IN_in_test.+IN_in_test/) {
                [COLOR=#FF0000]print[/color] [COLOR=#0000FF]unless[/color] ($[COLOR=#FF0000]1[/color] =~ /IN_peer_review.+IN_peer_review/);
        }
}

[COLOR=#0000FF]__DATA__[/color]
in_review assigned IN_peer_review IN_in_test resolved
in_review assigned IN_peer_review IN_in_test resolved concluded
in_review IN_design_assigned assigned IN_peer_review IN_in_test resolved
in_review assigned IN_peer_review IN_in_test resolved
entered assigned IN_peer_review IN_in_test assigned IN_peer_review IN_in_test assigned IN_peer_review IN_in_test resolved
in_review assigned IN_in_test resolved
in_review assigned IN_peer_review assigned IN_peer_review IN_in_test resolved
in_review assigned IN_peer_review IN_in_test resolved
assigned IN_peer_review IN_in_test resolved
entered assigned IN_peer_review assigned IN_peer_review IN_in_test assigned IN_peer_review IN_in_test resolved

Rather than trying to do too much in one regex, it extracts the substring before the first "IN_in_test" by using non-greedy match-everythings and then subsequently checks whether there are multiple "IN_peer_review" strings in that substring.

I'm sure it could be done in a regex but I generally find myself avoiding them due to lack of readability.

Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]
 
The most reliable way to solve this type of problem is to break it up. Yes, you can write the solution to this in a single regular expression. However, the more complicated the regex, the more likely you are to overlook cases or introduce a bug.

Therefore, use non-greedy matching to capture everything up to the first IN_in_test, and then use a second regex to see if there is more than one IN_peer_review in that section of the line:

Code:
#!/usr/bin/perl -w

use strict;
use warnings;

while (<DATA>) {
	if (/(.*?IN_peer_review.*?)IN_in_test.*?IN_in_test/) {
		my $prefix = $1;
		if ($prefix !~ /IN_peer_review.*IN_peer_review/) {
			print "Matching line: $_";
		}
	}
}

__DATA__
(line 1) in_review assigned IN_peer_review IN_in_test resolved
(line 2) in_review assigned IN_peer_review IN_in_test resolved concluded
(line 3) in_review IN_design_assigned assigned IN_peer_review IN_in_test resolved
(line 4) in_review assigned IN_peer_review IN_in_test resolved
(line 5) entered assigned IN_peer_review IN_in_test assigned IN_peer_review IN_in_test assigned IN_peer_review IN_in_test resolved
(line 5b) entered assigned IN_peer_review IN_peer_review IN_in_test assigned IN_peer_review IN_in_test assigned IN_peer_review IN_in_test resolved
(line 6) in_review assigned IN_in_test resolved
(line 7) in_review assigned IN_peer_review assigned IN_peer_review IN_in_test resolved
(line 8) in_review assigned IN_peer_review IN_in_test resolved
(line 9) assigned IN_peer_review IN_in_test resolved

- Miller
 
Thank you for your thoughts guys. I will look at this during the day.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top