Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Korn shell - grep'ing the correct data 1

Status
Not open for further replies.

dsparc

IS-IT--Management
Jun 18, 2003
6
GB
Hi

I wonder if anyone can offer any help on this. I'm trying to manipulate some text from a file using a korn shell script.

I have the following data in a text file:

------------------------
C, HEADER1, BLAH, BLAH
R, HEADER2, BLAH, BLAH

C, HEADER1, BLAH, BLAH
R, HEADER2, BLAH, BLAH

C, HEADER1, BLAH, BLAH
R, HEADER2, BLAH, BLAH
P, IMPORTANT, DATA
P, MORE, IMPORTANT, DATA

C, HEADER1, BLAH, BLAH
R, HEADER2, BLAH, BLAH
-------------------------

I'd like to cut out only the section that contains the lines beginning with P,. But I need to include the HEADER1 and HEADER2 lines above that data. The other sections with only 2 lines (C, and R,) can be ignored.

Any ideas? I can't think of any way of using grep to get that section but ignore the others.

Cheers in advance
 
Certain versions of grep allow for displaying the previous and following lines. use 'man grep' to see if yours does.

Alternatively it can be done using awk.

Ceci n'est pas une signature
Columb Healy
 
Hi

Thanks for the replies so far. My version of grep doesn't seem to support -B so I guess I will need to use awk in some way.

The problem I have is that there is not a set number of lines beginnig with P,. So even if I have some kind of check to see if the line following R, begins with P, I cannot tell it to print x number of lines as the number of lines beginning with P, can vary.

Is there a way to say "if you find the next line after an instance of R, begins with P, then print these lines as well as the 2 previous lines (C, and R,) until the next line DOESN'T begin with P,.

Sorry if this is a bit confusing. It's easy to explain face to face but hard on here :eek:)

Here's an extended piece of the file - I need to strip out the lines with the asterisks:

------------------------
C, HEADER1, BLAH, BLAH
R, HEADER2, BLAH, BLAH

C, HEADER1, BLAH, BLAH
R, HEADER2, BLAH, BLAH

C, HEADER1, BLAH, BLAH *************
R, HEADER2, BLAH, BLAH *************
P, IMPORTANT, DATA *************
P, MORE, IMPORTANT, DATA *************

C, HEADER1, BLAH, BLAH *************
R, HEADER2, BLAH, BLAH *************
P, IMPORTANT, DATA *************
P, MORE, IMPORTANT, DATA *************
P, MORE, IMPORTANT, DATA *************

C, HEADER1, BLAH, BLAH
R, HEADER2, BLAH, BLAH

C, HEADER1, BLAH, BLAH *************
R, HEADER2, BLAH, BLAH *************
P, IMPORTANT, DATA *************
P, MORE, IMPORTANT, DATA *************
P, MORE, IMPORTANT, DATA *************
P, MORE, IMPORTANT, DATA *************
P, MORE, IMPORTANT, DATA *************
-------------------------

Many thanks

 
That's easy, just grep for asterisks. :p

Just kidding, try this:

[tt]awk '
/^$/ { i=0 ; next }
/^P/ { for (j=1; j<=i; j++) { print a[j] } ; i=0; print; next }
{ a[++i]=$0 }
' inputfile[/tt]

Annihilannic.
 
Excellent! Worked perfectlty. I've been trying unsuccessfuly to decipher that code though. Can you elaborate on how it works? Cheers

 
It uses a buffer array a[] to store the previous lines so that whenever an "interesting" P line is encountered it can dump the contents of the buffer, then print the P line(s). The buffer index is reset to the beginning of the buffer whenever a blank line is encountered (i.e. no interesting lines were found) or a P line is found (because the data is no longer required after it has been printed).

[tt]awk '
# empty line, reset buffer index and skip to next line
/^$/ { i=0 ; next }
# P line, print any buffered previous lines, reset
# buffer index, print this line, and skip to next line
/^P/ { for (j=1; j<=i; j++) { print a[j] } ; i=0; print; next }
# any other line, add it to the buffer and increment
# the buffer index
{ a[++i]=$0 }
' inputfile [/tt]

Annihilannic.
 
Ok, thanks, I think I'm getting there. One last query...

I would now like to layout this data as below:

HEADER1, HEADER2, IMPORTANT, DATA
HEADER1, HEADER2, MORE, IMPORTANT, DATA

HEADER1, HEADER2, IMPORTANT, DATA
HEADER1, HEADER2, MORE, IMPORTANT, DATA
HEADER1, HEADER2, MORE, IMPORTANT, DATA

HEADER1, HEADER2, IMPORTANT, DATA
HEADER1, HEADER2, MORE, IMPORTANT, DATA
HEADER1, HEADER2, MORE, IMPORTANT, DATA
HEADER1, HEADER2, MORE, IMPORTANT, DATA
HEADER1, HEADER2, MORE, IMPORTANT, DATA

Where the HEADER1 and HEADER2 are the same for each block (i.e. the HEADERs that applied to that section of data in the first list).

Would this require a separate awk command or could it be incorporated into the first?

A thousand thank you's!
 
My solution accounted for any number of lines before a matching P line. If the format of your file is quite rigit and only contains C, R and P lines, then the script can be simpler:

Code:
awk '
        BEGIN { FS=OFS=", " }
        /^P/ { sub("P, ",""); print c,r,$0 }
        /^C/ { c=$2 }
        /^R/ { r=$2 }
        /^$/
'

It simply stores the values of the C and R headers when they are encountered, strips of the "P, " on a P line, and prints out the results. FS and OFS are special variables that define the input and output field separators (i.e. those that separate $1, $2, etc.).

Annihilannic.
 
Thanks Annihilannic!

I've actually used both awk sections. Your solution for the first was made with correct assumptions - there is a random amount of lines before a macthing P.

But, I then wanted to manipulate that output and have used your second solution to do that.

Everything seems to be working.

Thanks a lot!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top