I'm having problems with a Gawk script that looks something like this:
-----------------------------------------
while ( (getline < input_file) > 0 ) {
# skip file header
n_nr++
if ( n_nr < 3 ) continue
summary_detail_array=$2"|"$3
summary_detail[summary_detail_array]+=$1
}
for ( item in summary_detail ) print summary_detail[item],item > summary_output_file
-----------------------------------------
The input file has over 20 million lines.
When I run the script, it aborts due to insufficient memory ("fatal: newnode: nextfree: can't allocate memory (Not enough space)". Although the array has only 5000 elements (10 million input linres read), Gawk is using more than 2GB of RAM, at the time of the abort!
I've tried running the same script in Awk and the same thing happens (Awk however seems to use less memory).
My only explanation for this is that Gawk always adds a new element to the array, even when you are just updating an existing element. The old element is deleted, but the memory is not deallocated.
Does this make sense? If so, is there any workaround?
Thank you very much,
Romeu
-----------------------------------------
while ( (getline < input_file) > 0 ) {
# skip file header
n_nr++
if ( n_nr < 3 ) continue
summary_detail_array=$2"|"$3
summary_detail[summary_detail_array]+=$1
}
for ( item in summary_detail ) print summary_detail[item],item > summary_output_file
-----------------------------------------
The input file has over 20 million lines.
When I run the script, it aborts due to insufficient memory ("fatal: newnode: nextfree: can't allocate memory (Not enough space)". Although the array has only 5000 elements (10 million input linres read), Gawk is using more than 2GB of RAM, at the time of the abort!
I've tried running the same script in Awk and the same thing happens (Awk however seems to use less memory).
My only explanation for this is that Gawk always adds a new element to the array, even when you are just updating an existing element. The old element is deleted, but the memory is not deallocated.
Does this make sense? If so, is there any workaround?
Thank you very much,
Romeu