print string that is on the next line below the regex 1

FedoEx · Mar 30, 2010

simple example inputfile

Code:

regex1 regex2 regex3
1        2      3 
4        5      6 
regex1  regex3  regex2
aa       bb      cc
7        8       9

If I want to print the next line after the regex2.
In my inptut file

Code:

awk '/regex1/{getline;print  $0}' inputfile

How can I print only the field under specified regex.
I'm trying to accomplish this by loop over NF and calling the match() function.
Any suggestions.
Thanks

feherke · Mar 31, 2010

Hi

FedoEx said:
I'm trying to accomplish this by loop over NF and calling the match() function.

That is the right approach.

FedoEx said:
Any suggestions.

Next time please post the code as you modified it. So we can see where if your theory is wrong or only the implementation and we can correct whichever needs correction.

Code:

awk '/regex1/{for(i=1;i<=NF;i++)if($i~/regex1/)n=i;getline;print$n}' inputfile

Or the same using the [tt]match()[/tt] function :

Code:

awk '/regex1/{for(i=1;i<=NF;i++)if([highlight]match([/highlight]$i[highlight],[/highlight]/regex1/[highlight])[/highlight])n=i;getline;print$n}' inputfile

Feherke.

http://free.rootshell.be/~feherke/

FedoEx · Mar 31, 2010

Thanks.
My initial try was this.

Code:

 awk '{for(i=1;i<=NF;++i)if(match($i,/regex1/)); getline; print $i }'

What I don't understand is why the /regex1/ outside the {...} block is needed.
If you remove it awk should process that for loop inside the {...} for each line of the input file and yet if I remove the []b/regex1/[/b] out of the brackets it does not work.
Now is there a way doing this without the getline....

My problem is that I have huge number of regex and they're not necessarily appearing all of them on each line.
Here is more detailed example inputfile.

Code:

#begin 1  
#P1 regex1 regex2 regex3
#     11     12     13
#P2  regex4 regex5 regex6 
#     14     15     16   
#end block

#begin 2
#P1 regex4 regex3  regex1
#     24     23      21 
#P2 regex5 regex6  regex2
#     25     36      22
#end block

#begin 3
#P1 regex4  regex1  regex2
#     34       31    32
#P2 regex6  regex5  regex3
#     36       35    33 
#end block

Say I want to print the values under the regex1 regex2 for each block i tried this

Code:

#!/bin/awk
/#begin/{bl=$1}

/^#P/{for(i=1;i<=NF;i++) 
                       if($i~/regex1/)n=i; getline; val1=$m
                       if($i~/regex2/)m=i; getline; val2=$m 
     }
/^end/{print bl" "val1"  "val2}

I have another problem with this script.
If I comment the second if(...) and remove the val2 at the last block I will get the values under regex1 that are two blocks below. Why is that. When for(...) loop is triggered for #P2 if(..) wont evaluate to true, therefore there should not be reassignment of the val1.
Thanks.

feherke · Mar 31, 2010

Hi

FedoEx said:
Code:

awk '{for(i=1;i<=NF;++i)if(match($i,/regex1/)); getline; print $i }'

There your [tt]for[/tt] loop will always exit only when [tt]i<=NF[/tt] will evaluate to false, so when [tt]i[/tt] will be equal to [tt]NF+1[/tt]. You have to [tt]break[/tt] out from the [tt]for[/tt] loop when the [tt]if[/tt] condition is true :

Code:

awk '{for(i=1;i<=NF;++i)if(match($i,/regex1/)) [highlight]break[/highlight]; getline; print $i }' /input/file

But this is not enough.

FedoEx said:
What I don't understand is why the /regex1/ outside the {...} block is needed.

I let the block condition in place to avoid performing [tt]NF[/tt] regular expression matches if not needed.

You can remove the block condition, but then you will have to put a condition inside the block. For now, you are performing a [tt]getline()[/tt] regardless the current line contains the required expression or not. So effectively you are only testing the odd lines of the input and skipping the even lines.

You need a condition to get the next line only if the regular expression was found in the current one :

Code:

awk '{for(i=1;i<=NF;++i)if(match($i,/regex3/)) break; [highlight]if(i<=NF) {[/highlight] getline; print $i [highlight]}[/highlight] }' /input/file

FedoEx said:
Now is there a way doing this without the getline....

Of course there is. Just store the result of the search in a variable so you can access it when processing the next line :

Code:

awk '[highlight]n{print$n;n=0;next}[/highlight]{for(i=1;i<=NF;++i)if(match($i,/regex3/))[highlight]n=i[/highlight]}' /input/file

Regarding your multiple expression requirement, sorry, I do not fully understand it. Could you post the desired output for that sample input ?

Feherke.

http://free.rootshell.be/~feherke/

FedoEx · Mar 31, 2010

Thanks again feherke.
The deisired output for regex1 regex2 would be

Code:

1 11 12
2 21 22
3 31 32

To explain.
First value of each row is the block number.
Values two three are the values of the fields right under the regex1 and regex2.

feherke · Apr 1, 2010

Hi

Ah. Got it. This is trickier.

Code:

awk 'BEGIN{split("regex1 regex2",r)}$1=="#begin"{bl=$2}$1~/^#P/{for(i=1;i in p;i++)delete p[i];for(i=1;i in r;i++)for(j=1;j<=NF;j++)if($j~r[i])p[i]=j;getline;for(i=1;i in r;i++)if(p[i])v[i]=$p[i]}$1=="#end"{printf"%s",bl;for(i=1;i in r;i++)printf",%s",v[i];print""}' /input/file

Tested with [tt]gawk[/tt] and [tt]mawk[/tt].

It is abit lengthier than needed, but is extensible : if more expressions are needed, just enumerate them to the string which is [tt]split()[/tt] in the [tt]BEGIN[/tt] block.

Feherke.

http://free.rootshell.be/~feherke/

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

print string that is on the next line below the regex 1

FedoEx

Technical User

feherke

Programmer

FedoEx

Technical User

feherke

Programmer

FedoEx

Technical User

feherke

Programmer

Similar threads

Part and Inventory Search

Sponsor