Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Figuring out the regex problem

Status
Not open for further replies.

Ramnarayan

Programmer
Jan 15, 2003
56
US
Hi,

I have completely written the script to find the _it, _ft, _wt, _wa, _ua, _ba, _wk, _yc, _1g to _9g fields from the below list. Now these fields can appear only within a _t3 field which is given below. Hence, if a _t3 field contains atleast one of the (_it, _ft, _wt, _wa, _ua, _ba, _wk, yc, _1g to _9g), It should add 1 to $trans variable.

Here is the snippet of the code:

open (DTC, "<$toc") or fatal("Can't open $toc: $!");
while (my $line = <DTC>)
{
if ($line =~ m/^(_t3\s*.+\r?\n)$/)
{
$flag = 1;
}
else {$flag = 0;}
if ($line =~ m/^_(it|ft|wt|wa|ua|ba|wk|yc|[1-9]g)/ && $flag == 1)
{
++$trans;
}
}
close DTC;
print "Trans: $trans\n";

Here is the extract of the toc file. This has examples of _it and _wt

_t3 AP000024 00368075 AP990837 99A00100
_g1 Scientific Books
_ty BRV
_tw Reproduction artificielle de mineraux au XIXe siecle
_wt Reproduction artificielle de minéraux au XIXe siècle
_aw P. N. Tchirwinsky
_pg 66-68
_mf [Raw ASCII] 26 27 28
_mf [TIFF 6.0] 26 27 28
_t3 AP000024 00368075 AP990837 99A00110
_ti Miastor Larvae
_it Miastor Larvæ
_au E. P. Felt
_pg 583
_mf [Raw ASCII] 31
_mf [TIFF 6.0] 31

==================
Please note that each t3 may contain any of the (_it, _ft, _wt, _wa, _ua, _ba, _wk, yc, _1g to _9g). Either of these fields can occur anywhere within a _t3 block.

Can someone help me out. I will be very grateful to whoever can help me here!

Thanks
 
Sorry but can you please rephrase the question? What should the script do?
 
I agree with uida1154. You need to explain more clearly what you're trying to do. For example, what is your definition of a "field"? Is it the same as a line of data in your example? What do you mean when you say
"these fields can appear only within a _t3 field"?
 
The "field" corresponds to either of the fields in the below list:
_it, _ft, _wt, _wa, _ua, _ba, _wk, yc, _1g to _9g


So if there is a _t3 with a _it, _ft as below:

_t3 Blah
_it blah blah
_ft blah blah blah

It should cound one occurence to the variable $trans.

Hope this makes sense!
 
Where does a t3 field end? Does it end when we see a line that begins with something that is not
_it, _ft, _wt, _wa, _ua, _ba, _wk, yc, _1g to _9g?

For example, from your data above:
_t3 AP000024 00368075 AP990837 99A00100
_g1 Scientific Books <- t3 field ends here?

_t3 AP000024 00368075 AP990837 99A00110
_ti Miastor Larvae
_it Miastor Larvæ
_au E. P. Felt <- t3 field ends here?

So, using your example data, $trans would equal 2 at the end of the loop?


 
Yes that is exactly right! To make it simple, the _t3 ends with another _t3. That is all.


 
Okay, so once we've seen a t3, it never gets turned off, since the next one we see turns it back on? [3eyes]
I think this will do it:
Code:
while (my $line = <DTC>)
{
    if ($line =~ m/^_t3/)
    {
           $flag = 1;
           next;
    }
    if ($line =~ m/^_(it|ft|wt|wa|ua|ba|wk|yc|[1-9]g)/)
    {
           if ($flag)
           {
              ++$trans;
           }
    }
}
print "Trans: $trans\n";
This prints 2 with your example data.
 
Assuming the regex works....
Since He wants to count 1 occurrence of the match between _t3, you need to reset the $flag to 0 where you are incrementing.

if ($flag)
{
++$trans;
$flag = 0;
}
 
Now it works fine. Thanks for your valuable time and patience. I appreciate your efforts to help me out.
Laserbeam: Your resetting of $flag = 0 is correct. If that was not there, it will gives a wrong count.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top