Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

AWK performance

Status
Not open for further replies.

Romeu

MIS
Jun 25, 2002
9
PT
In my earlier scripts, I could add a bunch of loops and arrays without a significant impact on run time. In other words, the most time consuming operation was to scan the file, and I could add more and more validations to each line with only a minor impact on performance (one script has more than 1000 lines of code and takes 120 seconds to scan a 320000 line file).
In more recent scripts that doesn't happen. Each line validation is consuming more and more time. A script with just over 500 lines is taking almost 1000 seconds to scan 2.5 million lines (which means it's performance is roughly twice as bad as the previous script).
I cannot find anything fundamentally different about the two scripts. What type of operations can affect awk's performance by these amounts?
Thanks in advance for any help.
 
From the numbers you give, it seems that both scripts are processing about 2500 lines of the input file per second. So to answer you question, I think that for most scripts reading the input is by far the most time consuming thing that is done. CaKiwi
 
CaKiwi,

Yes, after re-reading my own text I have to agree that your answer is the only one that makes sense. My fault, sorry. I forgot to explain properly that the more recent script is taking increasingly more time running, as I add more and more validations to each line, in a more or less linear progression. Therefore the conclusion that if it had the same number of validations (roughly the same number of code lines), it would a 50% poorer performance. What can affect awk's performance by these amounts? It seems that simply adding an element to an array has a great impact on the time it takes to loop the array. A simple IF will add signifcant processing time. That didn't happen in my earlier scripts (and they still run fast presently) and I don't think I've changed my writing style significantly. What can it be?
Many thanks for any help you guys can provide me.
 
Is your script one big awk program?

If your timings need a boost why not have your script be several small awk scripts pipelined to each other. The separate processes can take advantage of pipelined processing and run in parallel. Unix tools like sed can be much faster and simpler tools like cut can up the speed the time devoted to a single operation(s).

My two cents
Cheers,
ND [smile]

bigoldbulldog@hotmail.com
 
Thanks for the info bigoldbulldog. I've also managed to redurce the run time by removing some "hidden" loops that were out of control. I've also tried gawk. For my script the improvement in performance was a whopping 260%!! I've also tried nawk, but the performance was very similar to the standard awk.
I'll make some performance tests to check out if gawk can improve the performance of any type of awk script by these amounts. Thanks for all the help and I'll keep you guys informed about the performance tests.


Thanks,
Romeu
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top