Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

pattern match problem

Status
Not open for further replies.

preethib

Programmer
Jul 20, 2000
39
IN
Hi,

I have a text file, which (A#,B#) pairs, that looks like"
"
A1: B1: "rank"
A2: B2: "rank term"
A3: B3: "rank term
A4: B4: rank all
A5: B5: rank all true
.."

I am trying to find errors in the 'B#:' lines if their is incorrect format.
I want to be able to print the line content OK if the syntax is OK i.e. the line has open and close quotes OR no quotes.(in above case for eg. B1,2,4 syntax is good)
I want to be able to print the line content INCORRECT if their is a syntax error i.e. the line has quotes mismatch(in above case for eg. B3 has bad syntax)

Please suggest a good 'regular expression' to find all good and bad lines in 'B#:' lines.

Any help would be great. thanks,
 
There might be a way to do it all in one regex but it would probably get pretty ugly. You can just check either possibility with two regexes like:
Code:
while (<DATA>) {
    next unless my ($n) = /^B(\d+)/;
    if ( /:\s+&quot;.*?&quot;$/ or /:\s+[^&quot;]+$/) {
        print &quot;$n OK\n&quot;;
    } else {
        print &quot;$n INCORRECT\n&quot;;
    }
}
If it is possible that the line contains 3 quotes then you might want to check for that .

jaa
 
Hi jaa,

&quot;if ( /:\s+&quot;.*?&quot;$/ )&quot; returns even the lines that have unmatched quoted lines while,
&quot;if ( /:\s+[^&quot;]+$/)&quot; will print lines without quotes.


;<
 
Hi jaa,

&quot;if ( /:\s+&quot;.*?&quot;$/ )&quot; will print &quot;lines NOT OK&quot; for lines that have matched and unmatched quoted lines
while,
&quot;if ( /:\s+[^&quot;]+$/)&quot; will print &quot;lines OK&quot; for the lines without quotes , which is good.

resending..
 
The choice of regexes isn't yours, it's Perl's. The 'or' is part of the code. Both regexes joined together with the 'or' give you the correct output. ( you can replace 'or' with '||' if that is less confusing ). I tested it using your examples and it worked for me.

jaa
 
That's weird. When I run the following code:
my $file =&quot;test.html&quot;;
my @title = split (/\./, $file);
print &quot;file=$file\n&quot;;
print &quot;title1=$title[1]\n&quot;;

The output is:
file=test.html
title1=html

What output do you get?
 
jaa,

I tried it with 'or' and '||', I get the same result.

Here is my code:
&quot;
..
if ($line =~ m/^B/) {
if ( ($line =~ m/:\s+&quot;.*?&quot;$/) || ($line =~ m/:\s+[^&quot;]+$/) ) {
$myrank = $line;
print &quot;lines OK : $myrank \n&quot;;
}
else {
print &quot;lines NOT OK : $line \n&quot;;
}
}
..
&quot;

My output:
&quot;
line NOT OK: &quot;image file

line NOT OK: &quot;rank Image Search&quot;

line NOT OK: &quot;rank image search&quot;

line NOT OK: rank search software&quot;

line NOT OK: &quot;Search Software&quot;

line OK: rank video file

line NOT OK: rank: &quot;Enterprise Software&quot;

line OK: rank enterprise software
&quot;

Any idea, why I am seeing it different..

thanks,
 
I am sorry I sent the wrong output file, here is actual output:
&quot;
line NOT OK: B: &quot;image file

line NOT OK: B: &quot;rank Image Search&quot;

line NOT OK: B: &quot;rank image search&quot;

line NOT OK: B: rank search software&quot;

line NOT OK: B: &quot;Search Software&quot;

line OK: B: rank video file

line NOT OK: B: &quot;Enterprise Software&quot;

line OK: B: rank enterprise software
&quot;
 
I would be more interested in your input data. What does it look like? I copied and pasted your code and used your output as the input and get:
Code:
lines NOT OK : B: &quot;image file
 
lines OK : B: &quot;rank Image Search&quot;
 
lines OK : B: &quot;rank image search&quot;
 
lines NOT OK : B: rank search software&quot;
 
lines OK : B: &quot;Search Software&quot;
 
lines OK : B: rank video file
 
lines OK : B: &quot;Enterprise Software&quot;
 
lines OK : B: rank enterprise software

jaa
 
jaa,

Is their a email address I can reach, so I can send input file and the program. I don't want to cause a problem with the policy in letting out information.

thanks,
 
It's possible that you have trailing whitespace at the end of your lines. The first regex would not match if that were true because it is expecting a &quot; character as the last thing before the newline. Try changing the first regex to

($line =~ m/:\s+&quot;.*?&quot;\s*$/)

Perhaps that will solve the problem. Try that first and post back. If that doesn't work you can email me the input.

jaa
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top