Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

hdisk0 at 100% , system very slow , commands have 10-15 second delay

Status
Not open for further replies.

strikelit

MIS
Sep 10, 2003
88
US
Good evening, I wanted to know what settings can I check for slow system response. I enter a command and it takes about 10-15 seconds to come up. The system is very slow. I typed a command "topas" and seen that the rootvg disks hdisk0 and hdisk7 were at 100% all the time, What can I do ??
 
That is probably due to extensive paging activity. Please show topas screen here?


HTH,

p5wizard
 
Also run the following command for 1 minute to view the disk & file stats; use trcstop to stop

filemon -v -o /tmp/fmon.out -O all

Mike

"A foolproof method for sculpting an elephant: first, get a huge block of marble, then you chip away everything that doesn't look like an elephant."

 
under the paging space , it shows 65% free, I dont understand what else it could be ?

Disk Busy% KBPS TPS KB-Read KB-Writ
hdisk0 100.1 905.1 189.2 428.5 476.6
hdisk7 100.1 861.1 179.2 384.5 476.6

PAGING MEMORY
Faults 6752 Real,MB 8192
Steals 4619 % Comp 49.8
PgspIn 202 % Noncomp 51.0
PgspOut 119 % Client 0.5
PageIn 4445
PageOut 406 PAGING SPACE
Sios 2306 Size,MB 10240
% Used 34.4
% Free 65.5
 
It's the PgspIn and PgspOut numbers that are killing you.

Please post the output of "vmstat 10 10".



Rod Knowlton

IBM Certified Advanced Technical Expert pSeries and AIX 5L
CompTIA Linux+
CompTIA Security+

 
After running filemon. this is what disk0 and hdisk7 show

Most Active Physical Volumes
------------------------------------------------------------------------
util #rblk #wblk KB/s volume description
------------------------------------------------------------------------
1.00 45039 299920 3769.8 /dev/hdisk7 N/A
1.00 38714 297568 3675.0 /dev/hdisk0 N/A
 
# vmstat 10 10

System configuration: lcpu=4 mem=8192MB

kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
1 25 1563336 770 0 385 25 3484 15336 0 3403 6775 6096 14 12 0 75
2 20 1563674 1070 0 227 199 2641 6284 0 2979 4636 4878 11 9 0 80
1 21 1564279 711 0 146 327 3061 6258 0 3573 4356 5537 11 10 0 79
1 18 1566891 1003 0 131 350 2048 7206 0 2391 3918 3159 7 7 1 86
2 20 1563894 1022 0 189 342 2490 9742 0 3252 3994 6658 11 11 0 78
2 20 1563159 977 0 371 37 3794 7402 0 3541 8652 7246 14 14 0 72
1 17 1564475 867 0 69 450 1903 4528 0 2564 3212 3432 5 7 0 89
1 15 1564884 467 0 125 408 1839 6803 0 2440 4523 2972 4 7 0 89
2 16 1564807 417 0 55 580 1645 5076 0 2436 2256 3197 3 6 2 89
1 13 1565799 960 0 120 412 2629 3697 0 3300 2927 5070 5 9 1 85

 
Also, you have 8GB of RAM, of which AIX is using 4GB for file I/O buffering. Perhaps you need to have a look at vmtune/vmo settings tuning (maxclient/maxperm/minperm)

What is the workload for this server?


HTH,

p5wizard
 
Your working set (memory used by programs) is a little over 6GB. Set maxperm to 20 and you should see some improvement, although it will cut down on your disk cache. The program queue numbers don't look very good, either, but you can't properly analyze them until you get the paging under control.

If these processes are i/o intensive, you should also add more disks and move the data onto them.

Rod Knowlton

IBM Certified Advanced Technical Expert pSeries and AIX 5L
CompTIA Linux+
CompTIA Security+

 
Here is the correct vmo stats............
# vmo -a
cpu_scale_memp = 8
data_stagger_interval = n/a
defps = 1
force_relalias_lite = 0
framesets = 2
htabscale = -1
kernel_heap_psize = n/a
large_page_heap_size = n/a
lgpg_regions = n/a
lgpg_size = n/a
low_ps_handling = 1
lru_file_repage = 1
lru_poll_interval = 10
lrubucket = 131072
maxclient% = 80
maxfree = 1088
maxperm = 1586997
maxperm% = 80
maxpin = 1699712
maxpin% = 80
mbuf_heap_psize = n/a
memory_affinity = n/a
memory_frames = 2097152
memplace_data = n/a
memplace_mapped_file = n/a
memplace_shm_anonymous = n/a
memplace_shm_named = n/a
memplace_stack = n/a
memplace_text = n/a
memplace_unmapped_file = n/a
mempools = 1
minfree = 960
minperm = 396749
minperm% = 20
nokilluid = 0
npskill = 20480
npsrpgmax = 163840
npsrpgmin = 122880
npsscrubmax = 163840
npsscrubmin = 122880
npswarn = 81920
num_spec_dataseg = n/a
numpsblks = 2621440
page_steal_method = 0
pagecoloring = n/a
pinnable_frames = 1868964
pta_balance_threshold = n/a
relalias_percentage = 0
rpgclean = 0
rpgcontrol = 2
scrub = 0
scrubclean = 0
 
[tt]
vmo -p -o maxperm%=20 -o maxclient%=20
[/tt]

Should help with the paging.

Rod Knowlton

IBM Certified Advanced Technical Expert pSeries and AIX 5L
CompTIA Linux+
CompTIA Security+

 
Are you using async I/O?

Please post

lsattr -El aio0
lspv

Also what app is running on the machine?

Mike

"A foolproof method for sculpting an elephant: first, get a huge block of marble, then you chip away everything that doesn't look like an elephant."

 
How could anyone give any advice on tuning a server after looking at output from topas or vmstat? Nobody has a clue what applications run on the server or what the typical load averages, io, cpu usage, paging, etc.

Certainly it looks like the paging is causing the high io based on the number of page-outs and the freed to scan rate, plus the number of processes that are blocked, but to have someone tune their system without knowing if this is typical, or if there was a recent change in some application.

More information is needed before any changes should be made.
 
That's why I asked what the workload is...

HTH,

p5wizard
 
Raw disk or JFS or JFS2? Based on the stuff you've shown, I'd guess JFS, so Rod's suggestions seems valid.

minperm%=10
maxperm%=20
maxclient%=20

But it might well need a closer look. Also, you might wan't to check if ORACLE SGA needs tuning. Perhaps you also need more RAM...


HTH,

p5wizard
 
kHz said:
How could anyone give any advice on tuning a server after looking at output from topas or vmstat?

Quite easily. Didn't you just see me do it? :)

Seriously, though, it's a matter of triage. strikelit's server is thrashing madly, and it has been since the last boot (check the summary line of vmstat). This means there are unhappy users. In the state the machine is in right now my recommendations will improve performance, which will make happier users, which will give strikelit some breathing room.

It's about stopping the bleeding before determining how the wound was inflicted and what our course of treatment will be.

Rod Knowlton

IBM Certified Advanced Technical Expert pSeries and AIX 5L
CompTIA Linux+
CompTIA Security+

 
What changed though? Is this the first time it has ever happened? If so, then obviously something has changed in the application or some other change was introduced to create the problem.

All I am saying: It is hard for me to believe the server was running happily along and suddenly this problem started out of the blue. If this is common then, yes, certainly tune, but if this is a new problem, then find the root cause, which certainly isn't tuning.
 
kHz,

My unstated assumption (I know, I know) was that a sudden change of this magnitude would have sent strikelit running to the applications people working on the server, not to this forum.

I hear what you're saying, but even if this is sudden, if the root cause cannot be identified and resolved quickly (like walking over to a programmer's cube and saying "stop doing that"), the paging can at least mitigated while investigating. It's not like my little vmo command causes an irreversible change to the machine. Even diagnosis, whatever tools you use, will benefit from the reduced paging.

I'd much rather send out an announcement to the user community that says "we're having some sort of problem, we've made an adjustment that should help some while we identify the cause" than one that says "we're having some sort of problem and will continue to until we find the root cause".

Rod Knowlton

IBM Certified Advanced Technical Expert pSeries and AIX 5L
CompTIA Linux+
CompTIA Security+

 
After doing the following , it seems to have cured the performance issue. I did a smitty chgsys, then disabled the Pre-520 tuning compatibility mode. I then performed the following commands
vmo -p -o lru_file_repage=0
vmo -p -o maxperm%=80 -o maxclient%=80
I then created a temp paging , performed a swapoff /dev/hd6. Then I then did a swapon /dev/hd6. Removed the temp paging space. This seemed to clear the problem.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top