Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

awk parsing is getting slower and stalled :(

Status
Not open for further replies.

FrancisMMM

IS-IT--Management
Jun 3, 2022
2
FR
hello, I just wrote my second script with awk and it's getting slower and slower :(

It parses requests from a tomcat server log which contains 448478 ones (on a 246MB file) :

Code:
...10000 / 448478 (83 secs)
...20000 / 448478 (91 secs)
...30000 / 448478 (90 secs)
...40000 / 448478 (87 secs)
...50000 / 448478 (86 secs)
...60000 / 448478 (87 secs)
...70000 / 448478 (88 secs)
...80000 / 448478 (90 secs)
...90000 / 448478 (94 secs)
...100000 / 448478 (98 secs)
...110000 / 448478 (94 secs)
...120000 / 448478 (119 secs)
...130000 / 448478 (134 secs)
...140000 / 448478 (153 secs)
...150000 / 448478 (188 secs)
...160000 / 448478 (211 secs)
...170000 / 448478 (226 secs)
...180000 / 448478 (240 secs)
...190000 / 448478 (260 secs)
...200000 / 448478 (253 secs)
...210000 / 448478 (259 secs)


Here is the awk script :


Code:
awk -F'[][]' -v serv="$host" '
			BEGIN { cur="dummy" ; c=0 ; num="%06d" } 
			{  
				# nouveau thread : incrément
				if ( $0 ~ / startstring /) { 
					cur=$4 ; 
					f[cur]++ ;
					c++;
					fn=serv"/"cur"-"sprintf(num,f[cur]) ; 
				# autres lignes
				} else { 
					if (length($4) > 4 ) { 
						cur=$4 ; 
						fn=serv"/"cur"-"sprintf(num,f[cur]) 
					}
					# dernière ligne
					if ( $0 ~ / endstring /) {
						print fn 
					}
				} 
				print $0 > fn  
			}
			END { print "#TotalRequests="c > "/dev/stderr" }' $hlog

The script collects the log between the start and end strings, then outputs the filename that is pushed to a bash script who does smalls tests and removes the file.
$4 is something like "http-thread-89"
Sometimes it is just stalled... I don't understand why...
the size of f is about 200.
And I don't think it has to do with the bash script since.... it is faster to do this with only bash !


Since I am a beginner, any help would be appreciated.

Cheers !


PS: edit
Same awk script with bash script removed

Code:
...10000 / 448478 (9 secs)
...20000 / 448478 (8 secs)
...30000 / 448478 (14 secs)
...40000 / 448478 (22 secs)
...50000 / 448478 (35 secs)
...60000 / 448478 (53 secs)
...70000 / 448478 (59 secs)


 
I think I found why... and now awk takes 3% CPU instead of 100%, I needed to :

- close the output files when they are done
- free variables

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top