Korn shell - grep'ing the correct data 1

dsparc · Aug 16, 2007

Hi

I wonder if anyone can offer any help on this. I'm trying to manipulate some text from a file using a korn shell script.

I have the following data in a text file:

------------------------
C, HEADER1, BLAH, BLAH
R, HEADER2, BLAH, BLAH

C, HEADER1, BLAH, BLAH
R, HEADER2, BLAH, BLAH

C, HEADER1, BLAH, BLAH
R, HEADER2, BLAH, BLAH
P, IMPORTANT, DATA
P, MORE, IMPORTANT, DATA

C, HEADER1, BLAH, BLAH
R, HEADER2, BLAH, BLAH
-------------------------

I'd like to cut out only the section that contains the lines beginning with P,. But I need to include the HEADER1 and HEADER2 lines above that data. The other sections with only 2 lines (C, and R,) can be ignored.

Any ideas? I can't think of any way of using grep to get that section but ignore the others.

Cheers in advance

columb · Aug 16, 2007

Certain versions of grep allow for displaying the previous and following lines. use 'man grep' to see if yours does.

Alternatively it can be done using awk.

Ceci n'est pas une signature
Columb Healy

Annihilannic · Aug 16, 2007

A similar question:

http://www.tek-tips.com/viewthread.cfm?qid=1398984&page=1

Annihilannic.

dsparc · Aug 17, 2007

Hi

Thanks for the replies so far. My version of grep doesn't seem to support -B so I guess I will need to use awk in some way.

The problem I have is that there is not a set number of lines beginnig with P,. So even if I have some kind of check to see if the line following R, begins with P, I cannot tell it to print x number of lines as the number of lines beginning with P, can vary.

Is there a way to say "if you find the next line after an instance of R, begins with P, then print these lines as well as the 2 previous lines (C, and R,) until the next line DOESN'T begin with P,.

Sorry if this is a bit confusing. It's easy to explain face to face but hard on here

)

Here's an extended piece of the file - I need to strip out the lines with the asterisks:

------------------------
C, HEADER1, BLAH, BLAH
R, HEADER2, BLAH, BLAH

C, HEADER1, BLAH, BLAH
R, HEADER2, BLAH, BLAH

C, HEADER1, BLAH, BLAH *************
R, HEADER2, BLAH, BLAH *************
P, IMPORTANT, DATA *************
P, MORE, IMPORTANT, DATA *************

C, HEADER1, BLAH, BLAH *************
R, HEADER2, BLAH, BLAH *************
P, IMPORTANT, DATA *************
P, MORE, IMPORTANT, DATA *************
P, MORE, IMPORTANT, DATA *************

C, HEADER1, BLAH, BLAH
R, HEADER2, BLAH, BLAH

C, HEADER1, BLAH, BLAH *************
R, HEADER2, BLAH, BLAH *************
P, IMPORTANT, DATA *************
P, MORE, IMPORTANT, DATA *************
P, MORE, IMPORTANT, DATA *************
P, MORE, IMPORTANT, DATA *************
P, MORE, IMPORTANT, DATA *************
-------------------------

Many thanks

Annihilannic · Aug 17, 2007

That's easy, just grep for asterisks.

Just kidding, try this:

[tt]awk '
/^$/ { i=0 ; next }
/^P/ { for (j=1; j<=i; j++) { print a[j] } ; i=0; print; next }
{ a[++i]=$0 }
' inputfile[/tt]

Annihilannic.

dsparc · Aug 17, 2007

Excellent! Worked perfectlty. I've been trying unsuccessfuly to decipher that code though. Can you elaborate on how it works? Cheers

Annihilannic · Aug 17, 2007

It uses a buffer array a[] to store the previous lines so that whenever an "interesting" P line is encountered it can dump the contents of the buffer, then print the P line(s). The buffer index is reset to the beginning of the buffer whenever a blank line is encountered (i.e. no interesting lines were found) or a P line is found (because the data is no longer required after it has been printed).

[tt]awk '
# empty line, reset buffer index and skip to next line
/^$/ { i=0 ; next }
# P line, print any buffered previous lines, reset
# buffer index, print this line, and skip to next line
/^P/ { for (j=1; j<=i; j++) { print a[j] } ; i=0; print; next }
# any other line, add it to the buffer and increment
# the buffer index
{ a[++i]=$0 }
' inputfile [/tt]

Annihilannic.

dsparc · Aug 17, 2007

Ok, thanks, I think I'm getting there. One last query...

I would now like to layout this data as below:

HEADER1, HEADER2, IMPORTANT, DATA
HEADER1, HEADER2, MORE, IMPORTANT, DATA

HEADER1, HEADER2, IMPORTANT, DATA
HEADER1, HEADER2, MORE, IMPORTANT, DATA
HEADER1, HEADER2, MORE, IMPORTANT, DATA

HEADER1, HEADER2, IMPORTANT, DATA
HEADER1, HEADER2, MORE, IMPORTANT, DATA
HEADER1, HEADER2, MORE, IMPORTANT, DATA
HEADER1, HEADER2, MORE, IMPORTANT, DATA
HEADER1, HEADER2, MORE, IMPORTANT, DATA

Where the HEADER1 and HEADER2 are the same for each block (i.e. the HEADERs that applied to that section of data in the first list).

Would this require a separate awk command or could it be incorporated into the first?

A thousand thank you's!

Annihilannic · Aug 17, 2007

My solution accounted for any number of lines before a matching P line. If the format of your file is quite rigit and only contains C, R and P lines, then the script can be simpler:

Code:

awk '
        BEGIN { FS=OFS=", " }
        /^P/ { sub("P, ",""); print c,r,$0 }
        /^C/ { c=$2 }
        /^R/ { r=$2 }
        /^$/
'

It simply stores the values of the C and R headers when they are encountered, strips of the "P, " on a P line, and prints out the results. FS and OFS are special variables that define the input and output field separators (i.e. those that separate $1, $2, etc.).

Annihilannic.

dsparc · Aug 17, 2007

Thanks Annihilannic!

I've actually used both awk sections. Your solution for the first was made with correct assumptions - there is a random amount of lines before a macthing P.

But, I then wanted to manipulate that output and have used your second solution to do that.

Everything seems to be working.

Thanks a lot!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Korn shell - grep'ing the correct data 1

dsparc

IS-IT--Management

columb

IS-IT--Management

Annihilannic

MIS

dsparc

IS-IT--Management

Annihilannic

MIS

dsparc

IS-IT--Management

Annihilannic

MIS

dsparc

IS-IT--Management

Annihilannic

MIS

dsparc

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor