Hi forum,
I've been away from awk in the last few years but have recently come back to it and am struggling . Can anyone help with the following?
Summary
I have a ascii file containing 20million+ lines.
The FS = <space>
Field 1 is always an accountid. There are multiple lines per accountid
Field 2 is always a type. (not sorted in alphabetical order)
Field 3 is the value (as per field 2 - not sorted)
Outcome
I want to generate an CSV output file that appends all details for each unique accountid onto 1 line. The 'values' need to be written in order even though the input may not be in order.
Sample Input File
accountid type value
12345 A 100
12345 B 200
12345 C 500
12345 D 900
12345 E 1000
23456 B 200
23456 D 300
23456 A 400
23456 E 500
23456 C 600
Expected Output
accountid,typeA,typeB,typeC,typeD,typeE
12345 100,200,500,900,1000
23456 400,200,600,300,500
Here's what I have currently;
The problem I'm having is that its skipping lines, so some of the 'values' aren't updated for the current accountid.
I suspect I'm making it way to complicated than it needs to be so if anyone can help it would be much appreciated.
thanks.
I've been away from awk in the last few years but have recently come back to it and am struggling . Can anyone help with the following?
Summary
I have a ascii file containing 20million+ lines.
The FS = <space>
Field 1 is always an accountid. There are multiple lines per accountid
Field 2 is always a type. (not sorted in alphabetical order)
Field 3 is the value (as per field 2 - not sorted)
Outcome
I want to generate an CSV output file that appends all details for each unique accountid onto 1 line. The 'values' need to be written in order even though the input may not be in order.
Sample Input File
accountid type value
12345 A 100
12345 B 200
12345 C 500
12345 D 900
12345 E 1000
23456 B 200
23456 D 300
23456 A 400
23456 E 500
23456 C 600
Expected Output
accountid,typeA,typeB,typeC,typeD,typeE
12345 100,200,500,900,1000
23456 400,200,600,300,500
Here's what I have currently;
Code:
nawk '{ if ( substr($1,1,1) => '1' ) {
accountid=$1
if ( $2 == "A" )
TA=$3
else if ( $2 == "B" )
TB=$3
else if ( $2 == "C" )
TC=$3
else if ( $2 == "D" )
TD=$3
else if ( $2 == "E" )
TE=$3
if ((getline tmp) > 0) {
if (substr(tmp,1,6) != accountid ) {
print accountid ","TA","TB","TC","TD","TE
close(tmp)
}
}
}
}' sample.file
The problem I'm having is that its skipping lines, so some of the 'values' aren't updated for the current accountid.
I suspect I'm making it way to complicated than it needs to be so if anyone can help it would be much appreciated.
thanks.