how to concatenate lines based on the value of a field, with awk

yyyy · Jul 4, 2001

I have a log from a parallel machine where for each job there are 2 lines: one corresponding to its start (contains
a field START) and one corresp. to its end (has a STOP).
They have a common field, the job id.
What I'm trying to do is to concatenate (part of) the STOP
line to its corresponding START line.
i thought that I should read all the STOP lines in an array and then parse the file again to find the matching start line, based on the job id.
What I would like to ask is:
1) with this solution i have to parse the huge file twice and at least parts of the array containing the STOp lines
for each START line. Is there any simple solution to my
problem using awk?
2)How can I refer to just one field from each line stored in the array? I thought I should use split, but I couldn't find any example which would select just a field from each
element of the array. ( array[j] would refer to the entire line, would array[j][k] work? )

I would highly appreciate any helpful hint!!!
Thanks a lot.

Anna

grega · Jul 6, 2001

Can you post an example of the file structure?

Greg.

flogrr · Jul 6, 2001

Hi Anna,

You didn't say if you wanted to retain all the
intervening lines between START and STOP, so
we can do it a couple of ways and you can use
which one suits your needs.

awk '

{
if ($0 ~ /START/) {
start = $0
getline
}

while ($0 !~ /STOP/) lines[++i] = $0 # store intervening lines

if ($0 ~ /STOP/) {
concat = $5" "$6 # using field 5 and 6 - change as required
print start" "concat # format as desired
for (j=1; j <= i; j++) print lines[j] # print intervening lines
print # print current STOP line
getline
}
}' inputfile > outputfile

awk '

{
if ($0 ~ /START/) {
start = $0
getline
}

while ($0 !~ /STOP/) getline # skip intervening lines

if ($0 ~ /STOP/) {
concat = $5" "$6 # using field 5 and 6 - change as required
print start" "concat # format as desired
next
}
}' inputfile > outputfile

This may not exactly work, but illustrates how it can be done.

Hope this helps!

Jesse

flogrr
flogr@yahoo.com

yyyy · Jul 9, 2001

Hi!
Thank you very much for your answers.
Here is a small sample of my file (after cutting off the
irrelevant fields). For each line with START there is a
matching line with STOP, somewhere in the file.
They can be matched using the large number right before the
user field, which is the job id. What I want to do is to concatenate all these pairs of lines (a START line with its
corresponding STOP line).
Unfortunately I was not very successful with it

(

Anna

64 17-08-2000 13:37:19 : START ./mcast_throughput (finish 17-08 13:41) 308390 user = versto
64 17-08-2000 13:37:39 : STOP ./mcast_throughput (finish 17-08 13:41) 308390 user = versto
32 17-08-2000 13:44:52 : STOP ./run (finish 17-08 13:45) 308371 user = wdittmer
32 17-08-2000 13:59:13 : START ./run (finish 17-08 14:15) 308391 user = wdittmer
64 17-08-2000 14:43:05 : START ./mantaStart (finish 17-08 15:44) 308393 user = arnold
64 17-08-2000 14:44:08 : STOP ./mantaStart (finish 17-08 15:44) 308393 user = arnold
64 17-08-2000 14:49:59 : START ./mantaStart (finish 17-08 15:50) 308394 user = arnold

grega · Jul 9, 2001

Here's my thoughts.

If the STOP line always comes before the next START line, then it can be done in awk with only 1 pass through the file. An awk script along these lines should work (it's up to you what you do with the lines once you find them - here I just print them):

BEGIN {startline="";jobid=""}
/START/ {startline=$0;jobid=$10}
/STOP/ {if (jobid==$10) {print startline, $0}}

Another approach is to use the Unix join command to do the work. As a simple example, this script will join together the START and STOP lines on job id. This is a "quick and dirty" solution, based on the structure of the example file you gave us, but you could play around with the principle. If you got the relevant lines joined together, you could then push the resulting output through awk to do whatever formatting you need.

#!/bin/ksh
grep START logfile > f1.tmp
grep STOP logfile > f2.tmp
# join files together on the 10th field (uses blank space
# as default field separator)
join -j 10 f1.tmp f2.tmp > outfile
rm f1.tmp f2.tmp

Greg.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

how to concatenate lines based on the value of a field, with awk

yyyy

Technical User

grega

Programmer

flogrr

Programmer

yyyy

Technical User

grega

Programmer

Similar threads

Part and Inventory Search

Sponsor