Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

printing lines before and after a match (revisited)

Status
Not open for further replies.

Bootstrap

Technical User
Nov 9, 2001
12
US
Hi Everyone,

In reference to thread thread271-1214284.

Code:
awk '/pattern/{for(i=0;i<19;i++)print a[(i+NR)%19];print;for(i=0;i<26;i++){getline;print}}{a[NR%19]=$0}' input.file

(Thanks Feherke. Used this many times.)

I'm in need of printing the lines before /match/, but instead of a constant NR, I have a constant string "Reply". I can't use a range since the block that contains /match/ appears several times, but is distinguished by a unique id that I get from the "Request" that matches "Reply". Problem is that the NR between "Reply" and /match/ is not constant. Can be 4, 5, or 6, NR before /match/.

<start of transaction <start><Request> </start><string1>text</string1><string2>match regex</string2><uniqueID>ID:123:456</uniqueID></start of transaction>

<end of transaction><end>Reply</end>{tags with text}<uniqueID>ID:123:456</uniqueID><string with regex>match</string with regex>{more tags with text}</end of transaction>

I've tried modifying Feherke's example to consistently get the lines between "Reply" and /match/, but just can't get it right. I can consistently get from match to </end of transaction>. The goal is getting the entire transaction with Request and entire Reply for /match/ and the unique id.

Thanks in advance.

Bootstrap(Robert)



 
Do you just want to output the matching Request and Reply lines, or do you want some (or all) of the data between them?

Annihilannic.
 
I need the line matching Request, and the lines that make up the Reply.

I should have used a better example. These are SOAP envelopes and I'll use the wikipedia example with a few changes.

Request <soap:Envelope xmlns:soap=" <soap:Body>
<getProductDetails xmlns=" <productID>827635</productID>
<clientID>ID:xyz123</clientID>
</getProductDetails>
</soap:Body>
</soap:Envelope>

Other info lines and Requests/Replies I don't need.

Reply <soap:Envelope xmlns:soap=" <soap:Body>
<getProductDetailsResponse xmlns=" <getProductDetailsResult>
<productName>Toptimate 3-Piece Set</productName>
<productID>827635</productID>
<clientID>ID:xyz123</clientID>
<description>3-Piece luggage set. Black Polyester.</description>
<price currency="NIS">96.50</price>
<inStock>true</inStock>
</getProductDetailsResult>
</getProductDetailsResponse>
</soap:Body>
</soap:Envelope>

I have a file with a few different clientID's I need all the transactions for. There are several transactions for a clientID, but only one unique productID per transaction. A Request is all a single long line. This makes it easy to find the Request line with clientID, get the unique productID, then print the line.

Later in the log the Reply appears. The Reply envelope is all separate lines, but contiguous beginning to end.

My nawk script is a function in a ksh script (different format than example).
At first I thought I could define a var for the productID and use it from there like this.
Code:
get_soap ()
{
 nawk -v cid="$CID" -F">" '

    $0 ~ cid && /Request/ {pid = substr($9,0,(match,$9,"<")-1));print $0}

#Now use pid to find it in the reply.  From that line that matches, get lines before and after for the whole reply.
#The Reply line being anywhere from 3 - 7 lines before the match.  
#The end, </soap:Envelope> being any number of lines away. The complete reply is always from Reply to </soap:Envelope>.

    $0 ~ pid &&  /<cng:responseTo>/ {for (i=0,i<7;i++) if (a[NR%7]) ~ "Reply") j=a[NR%7]; for (i=0;i<7;i++) if (a[NR%7] ~ pid) k=NR; m=(k-j);
     {for (i=m;i<7;i++) print a[(i+NR%7]}}{a[NR%7] = $0}

    $0 ~ pid && /<cng:responseTo>/ , /</soap:Envelope>/ {print $0}
     ' $LOG
}

After trial and error, then a lot more reading, I realized that the value of pid is lost as the next lines are evaluated. I couldn't figure out how to handle that, so I have this function above split into two functions/nawk scripts. First function prints the cid and pid to a file. Second function uses pid from the file to get the Request and Reply.

After my post, I was able to figure out how to get the lines from
Reply <soap:Envelope
to
$0 ~ pid && /<cng:responseTo>/.

Looks ugly and I'm sure there's a better way. It's also terribly inefficient. I'd now ask the questions:

How can I use this as one function, keeping the value of pid?
What's a better way to use the array for getting the whole reply?
Could I see how it should be done?

Thanks.

Bootstrap
 
It would really help if you could supply some of the real input data instead of an example from Wikipedia, with private information obfuscated of course.

Have you considered using a pidarray indexed by pid to hold the multiple pids? Then you can use the if (pid in pidarray) construct to test for its presence.

I would just use a linearray to store everything between "Reply" and "<cng:responseTo>" for every transaction using a range, and then decide after that whether or not it should be printed depending on whether a matching cid was seen.... if I'm understanding the problem correctly.

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top