AWK performance

Romeu · Jun 25, 2002

In my earlier scripts, I could add a bunch of loops and arrays without a significant impact on run time. In other words, the most time consuming operation was to scan the file, and I could add more and more validations to each line with only a minor impact on performance (one script has more than 1000 lines of code and takes 120 seconds to scan a 320000 line file).
In more recent scripts that doesn't happen. Each line validation is consuming more and more time. A script with just over 500 lines is taking almost 1000 seconds to scan 2.5 million lines (which means it's performance is roughly twice as bad as the previous script).
I cannot find anything fundamentally different about the two scripts. What type of operations can affect awk's performance by these amounts?
Thanks in advance for any help.

CaKiwi · Jun 27, 2002

From the numbers you give, it seems that both scripts are processing about 2500 lines of the input file per second. So to answer you question, I think that for most scripts reading the input is by far the most time consuming thing that is done. CaKiwi

Romeu · Jun 27, 2002

CaKiwi,

Yes, after re-reading my own text I have to agree that your answer is the only one that makes sense. My fault, sorry. I forgot to explain properly that the more recent script is taking increasingly more time running, as I add more and more validations to each line, in a more or less linear progression. Therefore the conclusion that if it had the same number of validations (roughly the same number of code lines), it would a 50% poorer performance. What can affect awk's performance by these amounts? It seems that simply adding an element to an array has a great impact on the time it takes to loop the array. A simple IF will add signifcant processing time. That didn't happen in my earlier scripts (and they still run fast presently) and I don't think I've changed my writing style significantly. What can it be?
Many thanks for any help you guys can provide me.

bigoldbulldog · Jul 2, 2002

Is your script one big awk program?

If your timings need a boost why not have your script be several small awk scripts pipelined to each other. The separate processes can take advantage of pipelined processing and run in parallel. Unix tools like sed can be much faster and simpler tools like cut can up the speed the time devoted to a single operation(s).

My two cents
Cheers,
ND [smile]

bigoldbulldog@hotmail.com

Romeu · Jul 8, 2002

Thanks for the info bigoldbulldog. I've also managed to redurce the run time by removing some "hidden" loops that were out of control. I've also tried gawk. For my script the improvement in performance was a whopping 260%!! I've also tried nawk, but the performance was very similar to the standard awk.
I'll make some performance tests to check out if gawk can improve the performance of any type of awk script by these amounts. Thanks for all the help and I'll keep you guys informed about the performance tests.

Thanks,
Romeu

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

AWK performance

Romeu

MIS

CaKiwi

Programmer

Romeu

MIS

bigoldbulldog

Programmer

Romeu

MIS

Similar threads

Part and Inventory Search

Sponsor