I'm trying to replace a convoluted piece of processing, which involves reading the same file 6 times, with a single nawk script.
However, I'm having severe performance problems with the first stage of the processing. This involves cutting a set of fields from the file using a comma-delimited list supplied as a parameter.
I call the command using:
The script takes a good two hours to run on a million row file (compared to about 10 minutes using something like "cut" to achieve the same effect). If I hard-code the column list using:
it takes around ten minutes. So it seems to be the looping and variable concatenation which is slow.
Anyone got any ideas how I can speed this up ? Or would I be better off using something like PERL to do this ? I am on Solaris 5.8
However, I'm having severe performance problems with the first stage of the processing. This involves cutting a set of fields from the file using a comma-delimited list supplied as a parameter.
Code:
BEGIN {
no_fields=split(inColumnList, columnList, ",")
}
{
outputline=""
for (i=1; i <= no_fields; i++)
{
outputline=outputline $columnList[i] FS
}
printf("%s\n", outputline) >> outputFile
}
I call the command using:
Code:
export _columnList="3,16,17,18,19,20,21,22,25,26,29,30,33,41,44,45,48,52,53,54,55,57,58,59,61,63,64,66,67,70,71,72,73,74
,75,76,77,81,82,84,85,86,87,88,89,90,91,93,94,95,96,97,98,99,100,101,102,103,104,107,108,109,110,111,112,113,114,115,116
,117,118,119,120,121,122,123,124,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147
,148,149,150,151,152,153,154,155,156,157,163,164,165,166,167,168,170,171,172,174,175,176,177,181,184,191,192,196,198,205
,206,208,220,223,225,226,228"
nawk -F"|" -f decomposeFile.awk -v inColumnList=$_columnList -v outputFile=$_outputFile $_inputFile
The script takes a good two hours to run on a million row file (compared to about 10 minutes using something like "cut" to achieve the same effect). If I hard-code the column list using:
Code:
{
printf("%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,
%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,
%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,
%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s\n", $3,$16,$17,$18,$19,$20,$21,$22,$25,$26,$29,$30,$33,$41,$
44,$45,$48,$52,$53,$54,$55,$57,$58,$59,$61,$63,$64,$66,$67,$70,$71,$72,$73,$74,$75,$76,$77,$81,$82,$84,$85,$86,$87,$88,$
89,$90,$91,$93,$94,$95,$96,$97,$98,$99,$100,$101,$102,$103,$104,$107,$108,$109,$110,$111,$112,$113,$114,$115,$116,$117,$
118,$119,$120,$121,$122,$123,$124,$126,$127,$128,$129,$130,$131,$132,$133,$134,$135,$136,$137,$138,$139,$140,$141,$142,$
143,$144,$145,$146,$147,$148,$149,$150,$151,$152,$153,$154,$155,$156,$157,$163,$164,$165,$166,$167,$168,$170,$171,$172,$
174,$175,$176,$177,$181,$184,$191,$192,$196,$198,$205,$206,$208,$220,$223,$225,$226,$228) >> outputFile
}
it takes around ten minutes. So it seems to be the looping and variable concatenation which is slow.
Anyone got any ideas how I can speed this up ? Or would I be better off using something like PERL to do this ? I am on Solaris 5.8