Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

AIX 5.2 waiting for IO is high

Status
Not open for further replies.

ydegauquier

Technical User
Nov 6, 2007
4
BE
Hi,
I have P5 systems (520), running AIX 5.2 ML07 CSP.
Mainly the applications are Oracle and the Oracle Cluster.
The storage is an Hitachi AMS1000, with HDLM 5.9

I'm surprised that monitoring the systems give me a high percentage of CPU waiting for IO (15% to 60%).

I try to discover which disk or device is consuming that CPU time but can't find it.

Is anybody can give me some advice about finding the cause of that CPU usage?

Thanks in advance.
 
Have you looked at paging space?

vmstat 1

Is it paging?

Regards,
Khalid
 
The servers does'nt seems to have paging problem:

System Configuration: lcpu=4 mem=15808MB
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
1 1 1530113 259351 0 0 0 14 344 0 2552 36388 5543 7 9 65 19
2 0 1530117 259347 0 0 0 0 0 0 3454 31085 7763 11 17 20 52
0 0 1530117 259347 0 0 0 0 0 0 3962 38073 9344 16 20 24 40
2 0 1535130 254332 0 0 0 0 0 0 3737 213891 8185 20 32 19 29
2 0 1530233 259228 0 0 0 0 0 0 3374 35896 7906 14 18 34 35
1 0 1530115 259345 0 0 0 0 0 0 3635 32308 8664 15 19 34 32
1 0 1530115 259345 0 0 0 0 0 0 3694 33220 8830 16 22 28 34
2 0 1530115 259343 0 0 0 0 0 0 4137 43539 9708 15 22 26 38
1 0 1530115 259343 0 0 0 0 0 0 3517 31680 8196 13 20 32 35
2 0 1530113 259345 0 0 0 0 0 0 3724 34632 8925 18 18 28 36
0 0 1530113 259342 0 0 0 0 0 0 3303 169889 7826 12 23 29 37
1 0 1530112 259343 0 0 0 0 0 0 3312 29999 7924 13 16 39 33
2 0 1530112 259343 0 0 0 0 0 0 3582 35967 8555 14 17 32 37
1 0 1530160 259295 0 0 0 0 0 0 3588 33969 8630 15 19 39 27
1 0 1530160 259295 0 0 0 0 0 0 4053 39747 9815 13 22 34 31
2 0 1530160 259295 0 0 0 0 0 0 3705 32701 8892 13 16 40 30
2 0 1530196 259258 0 0 0 0 0 0 4421 45365 10114 20 23 33 25
2 0 1530160 259294 0 0 0 0 0 0 3843 41289 8731 12 21 41 26
1 0 1530400 259054 0 0 0 0 0 0 2748 21242 6055 7 9 42 42
1 0 1530160 259294 0 0 0 0 0 0 2685 18344 5959 8 13 43 36
 
You might need to tune your disks if you discover that one of them is having the bottleneck after issuing the iostat as what victorv mentioned above!

You might need to tune LTG size or pbuf or fsbuf or you need to migrate some lpps from one disk to another less utilized disk!

Have a look into this link:


Regards,
Khalid
 
Oracle

Missing Index / Waiting for Lock?

Get an eval of Sarcheck @ sarcheck.com

Mike

"Whenever I dwell for any length of time on my own shortcomings, they gradually begin to seem mild, harmless, rather engaging little things, not at all like the staring defects in other people's characters."
 
I have/had the same problem on a similar configuration in a customers environment. Even with oracle, but this is not even started and external SCSI Disks from Hitachi. The maximum troughput from these disks is 40 MB/s, although the attached SCSI adapter is built for theoretical 320 MB/s. If I split the traffic to 8 Disks, the maximum throughput is 5 MB/s. So, anyway, this is not enough. Leads to similar problem as yours, but I did not found the answer yet.

BTW, there is no paging activity in this system, and all the CPU is in iowait-State. So, there must be a button pushed to boost the throughput....

Greetings

mad

Advanced Interactive eXecutable
 
My disks are on a Hitachi SAN (AMS1000).
The iostat give me as result that I have some disks that are quite busy.
But disks busy indicate for me an activity on the system (database), but doesn't explain the io wait (ok, many io can cause iowait)...

hdisk27 0.3 11.8 0.9 39727042 4352784
hdisk29 0.4 12.2 1.1 38710330 6881264
hdisk62 0.5 15.9 1.4 20104541 39266546
hdisk24 0.7 11.1 1.4 19189617 22091772
hdisk30 0.8 107.6 0.9 206248156 194358986
hdisk37 1.0 35.1 8.8 11695311 118949011
hdisk36 1.4 59.4 9.0 102115182 118949011
hdisk89 1.4 123.8 5.6 410338550 50765302
hdisk39 1.6 61.4 17.2 10127620 218487710
hdisk63 1.6 433.3 2.0 812084493 801455962
hdisk107 2.3 3.1 4.3 9765478 1816024
hdisk35 2.3 384.4 1.8 723080739 708535907
hdisk38 2.3 115.3 17.7 210725195 218487734
hdisk0 2.6 39.7 5.3 39495669 108233129
hdisk1 2.7 44.4 6.0 57003993 108227753
hdisk34 2.9 433.1 2.0 811874769 800889677
hdisk31 3.0 385.9 2.2 723844763 713150788
hdisk2 3.8 43.1 6.3 30909369 129589412
hdisk3 3.9 76.9 7.7 156886797 129579684
hdisk33 5.2 384.4 1.8 723124399 708382851
hdisk32 5.9 435.7 2.4 816495956 805922846
hdisk57 20.7 543.3 44.4 1881879242 141193420
hdisk56 21.4 554.3 46.8 1918308974 145795204
hdisk55 22.5 553.9 46.0 1910326630 152251708
hdisk54 24.5 555.3 46.0 1916538898 151282700
hdisk60 29.5 150.5 42.3 521956162 38481437
hdisk61 30.3 119.7 45.4 387194826 58444604
hdisk59 31.2 174.1 44.9 602600650 45773913
hdisk58 37.8 92.1 41.9 298903242 44243325
 
Are your users complaining? I don't see a heck of a lot of activity on that system: average run queue is 1, average wait queue is 1. There doesn't seem to be a whole lot this system can do but wait for I/O while it is waiting for other work to be thrown at it... Hence the big iowait percentages?


HTH,

p5wizard
 
There are no complain from the users, I decided to wait before making more investigation.
The fact that the system is making a lot of IO doesn't indicate me that the system has bad performances.
Finding bottleneck ant tunning a system is not really easy, and what I remember from my IBM training about AIX 5L perf and tunning is that it's beter to leave AIX with default values excepted if we know the real impact...
 
Wait IO can be a strange thing ! It can sometimes indicate efficient code particularly in a batch environment eg a batch job processes each record in the database with minimum CPU time, then most of the time will legitimately be spent waiting for data from disk, therefore the stats of the CPU will show it mostly waiting for IO. Oddly, if it were less effecient in the processing (CPU) stage it would show more CPUUserTime and less Wait IO. It is important to remember that WaitIO is an idle state of the CPU and therefore CPU usage is not the issue in high wait IO.
 
Might be worth asking your DBA team to collect some stats on the database. I've got an app that is exactly as andy61 describes - batch processing, neither app server nor DB server is stressed; the app happens to perform over 600 updates to each record in a .5TB database, and the slowdown manifests as high iowait time.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top