Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

gawk RS using regex 2

Status
Not open for further replies.

frankli

Technical User
Apr 6, 2005
44
CA
Hello List,

I use regex while define RS in gawk, everything is working fine except I don't know the trick to print the original RS, please see if you can help.

source:

Mon Oct line1
line2
line3
Tue Nov line4
line5
Fri Dec line6
line7

statement
gawk 'BEGIN { RS = "(Mon|Tue|Wed|Thu|Fri|Sat|Sun) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Dec|Nov)" } { print NR RS " " $0 }

output
1(Mon|Tue|Wed|Thu|Fri|Sat|Sun) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Dec|Nov)
2(Mon|Tue|Wed|Thu|Fri|Sat|Sun) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Dec|Nov) line1
line2
line3

3(Mon|Tue|Wed|Thu|Fri|Sat|Sun) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Dec|Nov) line4
line5

4(Mon|Tue|Wed|Thu|Fri|Sat|Sun) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Dec|Nov) line6
line7

desire output

1 Mon Oct line1
line2
line3

2 Tue Nov line4
line5

3 Fri Dec line6
line7


Thanks for any input.





 
I don't think setting RS to a regex is a good way to do this, because the strings that match RS are thrown away. Interesting idea though! :)

Instead I would just use it to match the line in the usual way and use my own counter instead of NR, e.g.

Code:
awk '
        /(Mon|Tue|Wed|Thu|Fri|Sat|Sun) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Dec|Nov)/ { print ++n,$0; next }
        { print }
' inputfile

Annihilannic.
 
Thanks for your input Annihilannic!

I did a mistake by telling the half story, the reason I wanted to use RS shortcut is ... I have another pattern to match in the desire output lines, something like,

desire output

1 Mon Oct line1
line2
line3 (ERROR_1)

2 Tue Nov line4
line5

3 Fri Dec line6
line7 (ERROR_2)

I need to pick line 1 and 3 out, but I don't think I can put above code in the BEGIN section and do another match in main body or END section, I will need some time to think about it.
 
Hi

Annihilannic said:
I don't think setting RS to a regex is a good way to do this, because the strings that match RS are thrown away.
Correct for POSIX, but frankli uses [tt]gawk[/tt].
man gawk said:
[tt]RT The record terminator. Gawk sets RT to the input text that
matched the character or regular expression specified by
RS.[/tt]
( Thanks again for vlad's suggestion made in thread271-1114091. )

Feherke.
 
Hi feherke,

There is still something missing if use RT, if you could elaborate,

gawk 'BEGIN { RS = "(Mon|Tue|Wed|Thu|Fri|Sat|Sun) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Dec|Nov)" } /ERROR_/ { print NR " " RT " " $0 }'

input

Mon Oct line1
line2
line3 (ERROR_1)
Tue Nov line4
line5
Fri Dec line6
line7 (ERROR_2)

output

2 Tue Nov line1
line2
line3 (ERROR_1)

4 line6
line7 (ERROR_2)

Thanks!

 
Hi

Indeed. You misunderstand it. The record comes first, then the delimiter :
Code:
[highlight red]Mon Oct[/highlight][green][b]|[/b][/green][blue] line1
line2
line3 (ERROR_1)
[/blue][highlight red]Tue Nov[/highlight][green][b]|[/b][/green][blue] line4
line5
[/blue][highlight red]Fri Dec[/highlight][green][b]|[/b][/green][blue] line6
line7 (ERROR_2)[/blue][green][b]|[/b][/green]
Where I used colors for [tt][highlight red]RT[/highlight][/tt] and [tt][blue]$0[/blue][/tt]. Also added an extra [tt][green]|[/green][/tt] mark to visually delimit the records.

To use this for your task you should transform it somehow like this :
Code:
gawk -vRS='\\(ERROR_[0-9]+\\)' 'RT{print NR,gensub(/.*(Mon|Tue|Wed|Thu|Fri|Sat|Sun) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Dec|Nov)/,"\\1 \\2","",$0),RT}' /input/file
But note that you will not have [tt]NR[/tt] values a before.

So Annihilannic's suggestion may still be a better way.

Feherke.
 
Thanks for the RT tip feherke. I generally try and avoid the GNU extensions, but it could be useful...

Try this perhaps:

Code:
awk '
        # increment record counter, reset array
        /(Mon|Tue|Wed|Thu|Fri|Sat|Sun) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|
Dec|Nov)/ { n++; i=0 }
        # accumulate record lines in array
        { a[++i]=$0 }
        # error found, print record number and array contents
        /ERROR_/ {
                printf "%d ",n
                for (j=1; j<=i; j++) { print a[j] }
        }
' inputfile

Annihilannic.
 
Thanks very much for all the tips and comments. You guys are great!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top