Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

print string that is on the next line below the regex 1

Status
Not open for further replies.

FedoEx

Technical User
Oct 7, 2008
49
US
simple example inputfile
Code:
regex1 regex2 regex3
1        2      3 
4        5      6 
regex1  regex3  regex2
aa       bb      cc
7        8       9

If I want to print the next line after the regex2.
In my inptut file

Code:
awk '/regex1/{getline;print  $0}' inputfile

How can I print only the field under specified regex.
I'm trying to accomplish this by loop over NF and calling the match() function.
Any suggestions.
Thanks
 
Hi

FedoEx said:
I'm trying to accomplish this by loop over NF and calling the match() function.
That is the right approach.
FedoEx said:
Any suggestions.
Next time please post the code as you modified it. So we can see where if your theory is wrong or only the implementation and we can correct whichever needs correction.
Code:
awk '/regex1/{for(i=1;i<=NF;i++)if($i~/regex1/)n=i;getline;print$n}' inputfile
Or the same using the [tt]match()[/tt] function :
Code:
awk '/regex1/{for(i=1;i<=NF;i++)if([highlight]match([/highlight]$i[highlight],[/highlight]/regex1/[highlight])[/highlight])n=i;getline;print$n}' inputfile

Feherke.
 
Thanks.
My initial try was this.
Code:
 awk '{for(i=1;i<=NF;++i)if(match($i,/regex1/)); getline; print $i }'
What I don't understand is why the /regex1/ outside the {...} block is needed.
If you remove it awk should process that for loop inside the {...} for each line of the input file and yet if I remove the []b/regex1/[/b] out of the brackets it does not work.
Now is there a way doing this without the getline....

My problem is that I have huge number of regex and they're not necessarily appearing all of them on each line.
Here is more detailed example inputfile.
Code:
#begin 1  
#P1 regex1 regex2 regex3
#     11     12     13
#P2  regex4 regex5 regex6 
#     14     15     16   
#end block

#begin 2
#P1 regex4 regex3  regex1
#     24     23      21 
#P2 regex5 regex6  regex2
#     25     36      22
#end block

#begin 3
#P1 regex4  regex1  regex2
#     34       31    32
#P2 regex6  regex5  regex3
#     36       35    33 
#end block
Say I want to print the values under the regex1 regex2 for each block i tried this
Code:
#!/bin/awk
/#begin/{bl=$1}

/^#P/{for(i=1;i<=NF;i++) 
                       if($i~/regex1/)n=i; getline; val1=$m
                       if($i~/regex2/)m=i; getline; val2=$m 
     }
/^end/{print bl" "val1"  "val2}
I have another problem with this script.
If I comment the second if(...) and remove the val2 at the last block I will get the values under regex1 that are two blocks below. Why is that. When for(...) loop is triggered for #P2 if(..) wont evaluate to true, therefore there should not be reassignment of the val1.
Thanks.
 
Hi

FedoEx said:
Code:
awk '{for(i=1;i<=NF;++i)if(match($i,/regex1/)); getline; print $i }'
There your [tt]for[/tt] loop will always exit only when [tt]i<=NF[/tt] will evaluate to false, so when [tt]i[/tt] will be equal to [tt]NF+1[/tt]. You have to [tt]break[/tt] out from the [tt]for[/tt] loop when the [tt]if[/tt] condition is true :
Code:
awk '{for(i=1;i<=NF;++i)if(match($i,/regex1/)) [highlight]break[/highlight]; getline; print $i }' /input/file
But this is not enough.
FedoEx said:
What I don't understand is why the /regex1/ outside the {...} block is needed.
I let the block condition in place to avoid performing [tt]NF[/tt] regular expression matches if not needed.

You can remove the block condition, but then you will have to put a condition inside the block. For now, you are performing a [tt]getline()[/tt] regardless the current line contains the required expression or not. So effectively you are only testing the odd lines of the input and skipping the even lines.

You need a condition to get the next line only if the regular expression was found in the current one :
Code:
awk '{for(i=1;i<=NF;++i)if(match($i,/regex3/)) break; [highlight]if(i<=NF) {[/highlight] getline; print $i [highlight]}[/highlight] }' /input/file
FedoEx said:
Now is there a way doing this without the getline....
Of course there is. Just store the result of the search in a variable so you can access it when processing the next line :
Code:
awk '[highlight]n{print$n;n=0;next}[/highlight]{for(i=1;i<=NF;++i)if(match($i,/regex3/))[highlight]n=i[/highlight]}' /input/file
Regarding your multiple expression requirement, sorry, I do not fully understand it. Could you post the desired output for that sample input ?


Feherke.
 
Thanks again feherke.
The deisired output for regex1 regex2 would be
Code:
1 11 12
2 21 22
3 31 32
To explain.
First value of each row is the block number.
Values two three are the values of the fields right under the regex1 and regex2.
 
Hi

Ah. Got it. This is trickier.
Code:
awk 'BEGIN{split("regex1 regex2",r)}$1=="#begin"{bl=$2}$1~/^#P/{for(i=1;i in p;i++)delete p[i];for(i=1;i in r;i++)for(j=1;j<=NF;j++)if($j~r[i])p[i]=j;getline;for(i=1;i in r;i++)if(p[i])v[i]=$p[i]}$1=="#end"{printf"%s",bl;for(i=1;i in r;i++)printf",%s",v[i];print""}' /input/file
Tested with [tt]gawk[/tt] and [tt]mawk[/tt].

It is abit lengthier than needed, but is extensible : if more expressions are needed, just enumerate them to the string which is [tt]split()[/tt] in the [tt]BEGIN[/tt] block.


Feherke.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top