Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Performance problem - (p670/aix5.2)

Status
Not open for further replies.

shoux

Technical User
Nov 9, 2000
83
MY
Hi expert !

The topas shown that around 70% CPU resources used up for WAIT process. what does it meaning at present we are facing performance degraded.the only thing i can do is to restart the server. currently, 400 user are using the system.

Thank you in advance

Regards
Shoux


Fri May 20 12:18:17 2005 Interval: 2 Cswitch 803 Readch 5641.0K
Syscall 17183 Writech 1116.8K
Kernel 3.3 |## | Reads 4497 Rawin 76
User 1.1 |# | Writes 2736 Ttyout 19451
Wait 67.9 |#################### | Forks 2 Igets 1
Idle 27.8 |######## | Execs 2 Namei 1439
Runqueue 0.0 Dirblk 38
Network KBPS I-Pack O-Pack KB-In KB-Out Waitqueue 8.5
en0 148.3 258.0 188.5 114.5 33.8
lo0 0.0 0.0 0.0 0.0 0.0 PAGING MEMORY
Faults 933 Real,MB 8191
Disk Busy% KBPS TPS KB-Read KB-Writ Steals 650 % Comp 25.4
hdisk2 100.0 3206.0 755.0 136.0 3070.0 PgspIn 0 % Noncomp 75.4
hdisk0 25.5 393.2 63.0 0.0 393.2 PgspOut 75 % Client 69.5
hdisk1 23.0 359.2 55.0 0.0 359.2 PageIn 33
dac0 0.0 0.0 0.0 0.0 0.0 PageOut 852 PAGING SPACE
Sios 877 Size,MB 8192
Name PID CPU% PgSp Owner % Used 3.0
uvsh 45788 0.9 5.7 sb02 NFS (calls/sec) % Free 96.9
j2pg 6124 0.3 0.0 root ServerV2 0
uvsh 174216 0.3 4.6 bt11 ClientV2 0 Press:
uvsh 108962 0.2 22.8 kn06 ServerV3 0 "h" for help
uvsh 145814 0.2 22.8 kl06 ClientV3 0 "q" to quit


shoux
 
wait% indicates the percentage of time processes are waiting on an i/o to complete. This can be disk, tape, network (I believe), anything that can block an i/o

Since your run queue is so low (0), processes are being served for computations fast enough. But with the wait queue at 8.5, processes are waiting for I/Os to complete.

Looking at the disks, hdisk2 shows to be very busy compared to the others. 3206 KBPS vs 393 for the others.

You should review what is on hdisk2 (lspv -l hdisk2); you could use lsof to check what processes are hitting the associated processes.

Look at your system error log to see if there are any problem reported for hdisk2.

That's all for now.
-glenn
 
In addition to the excellent advice ggauthier gave you, take a look at the filemon command. This command can tell you what is hammering hdisk2 so hard.


Jim Hirschauer
 
Problem could be in disk access, how you mount the filesystems?? use cio (concurrent input output option in mount.)
 
By looking at your results, I'd guess it's hdisk2. It seems to be a hotspot. Something is "wait"ing it's turn to write to hdisk2.
 
enough numfsbufs (jfs) or j2_nBufferPerPagerDevice (jfs2)

use vmstat -v and look at the (possibly fast increasing) numbers

HTH,

p5wizard
 
is your jfs-jfs2 log also on hdisk2? Because that also causes traffic to your disk...
 
To add couple of more observatins.

Yes,the issue is hdisk2.
However ,there is no current process that comsumes high CPU percentage,so there seem to be no CPU/network activity on the system.Therefore,I would check the following things extra to the advices above :

1.yes,filemon can hint on some potential application bug that loops on hdisk2.
2.check hdisk2 scsi adapter settings - are they set to max bus speed.
3.check in errpt that there are no scsi/hdisk errors related to hdisk2
4.verify that the watermark (high and low) are zero (use "lsattr -El sys0|grep mark").
5.post the result of "lsdev -Csscsi" and "ldsev -Ccdisk"

Long live king Moshiach !
 
Thank you for all response !

yes ! hdisk2 is consist of datavg. the following is the ouput as requires. Thanks

[p670] >lsdev -ssscsi
cd0 Available 3A-08-00-5,0 16 Bit SCSI Multimedia CD-ROM Drive
hdisk0 Available 37-08-00-8,0 16 Bit LVD SCSI Disk Drive
hdisk1 Available 3s-08-00-8,0 16 Bit LVD SCSI Disk Drive
rmt0 Available 3A-08-00-6,0 SCSI 4mm Tape Drive
rmt1 Available 2k-08-00-0,0 IBM 3580 Ultrium Tape Drive
ses0 Available 2s-08-00-15,0 SCSI Enclosure Services Device
ses1 Available 37-08-00-15,0 SCSI Enclosure Services Device
ses2 Available 3b-08-00-15,0 SCSI Enclosure Services Device
ses3 Available 3s-08-00-15,0 SCSI Enclosure Services Device
smc0 Available 2k-08-00-1,0 IBM 3581 Tape Medium Changer
[p670] >lsdev -Ccdisk
hdisk0 Available 37-08-00-8,0 16 Bit LVD SCSI Disk Drive
hdisk1 Available 3s-08-00-8,0 16 Bit LVD SCSI Disk Drive
hdisk2 Available 3F-08-01 1742 (700) Disk Array Device


[p670] >vmstat -v
2097152 memory pages
1981068 lruable pages
623366 free pages
1 memory pools
214215 pinned pages
80.1 maxpin percentage
20.0 minperm percentage
80.0 maxperm percentage
59.4 numperm percentage
1178487 file pages
0.0 compressed percentage
0 compressed pages
48.8 numclient percentage
80.0 maxclient percentage
968739 client pages
0 remote pageouts scheduled
7097248 pending disk I/Os blocked with no pbuf
0 paging space I/Os blocked with no psbuf
24813 filesystem I/Os blocked with no fsbuf
0 client filesystem I/Os blocked with no fsbuf
365730 external pager filesystem I/Os blocked with no fsbuf
[p670] > LABEL: CORE_DUMP_FAILED
IDENTIFIER: 45C7A35B

Date/Time: Fri May 20 18:01:44 WAUS
Sequence Number: 63211
Machine Id: 003044DB4C00
Node Id: SEAIB_SERVER
Class: S
Type: PERM
Resource Name: SYSPROC

Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes
INTERNAL SOFTWARE ERROR
SYSTEM RUNNING OUT OF PAGING SPACE

User Causes
USER GENERATED SIGNAL

Failure Causes
Failure Causes
CORE DUMP FAILED - SEE A REASON CODE BELOW

Recommended Actions
DEFINE ADDITIONAL PAGING SPACE
RERUN THE APPLICATION PROGRAM
IF PROBLEM PERSISTS THEN DO THE FOLLOWING
CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
SIGNAL NUMBER
11
USER'S PROCESS ID:
88920
REASON CODE
13
USER ID
1513
PROCESSOR ID
1
CORE FILE NAME
/app/AAA.0505/core
PROGRAM NAME
uvsh


[p670] >lsps -a
Page Space Physical Volume Volume Group Size %Used Active Auto Type
paging00 hdisk2 datavg 5632MB 2 yes no lv
hd6 hdisk0 rootvg 2560MB 4 yes yes lv
[p670] >
[p670] >lspv -l hdisk2
hdisk2:
LV NAME LPs PPs DISTRIBUTION MOUNT POINT
paging00 88 88 88..00..00..00..00 N/A
lvuvspool 100 100 00..100..00..00..00 /uvspool
lv03 459 459 00..216..128..115..00 /uv3
lv01 841 841 00..314..374..153..00 /uv1
lvapp 154 154 00..26..128..00..00 /app
lv02 459 459 00..102..128..229..00 /uv2
[p670] >lspv
hdisk0 003044db7bfc092a rootvg active
hdisk1 003044db40e5920a rootvg active
hdisk2 003044db411efccd datavg active
[p670] >
[p670] >lsattr -El sys0|grep mark
maxpout 0 HIGH water mark for pending write I/Os per
file True
minpout 0 LOW water mark for pending write I/Os per
file True
[p670] >





shoux
 
Looking at the above I can think of two directions:

1.Since there is some paging space activity to hdisk2,you could try decreasing redundant paging by settting your vmtune params as following:

/usr/sbin/vmo -p -o maxclient%=30 -o maxperm%=30 -o minperm%=10

2.I had a case when a RAID array would go into a sustained 100% usage without a reason.At the end of the day we have worked around it by decreasing the DISK RAID queue size to "1" :

To check:
lsattr -El hdisk2|grep queue_depth

Then the value can be decreased to smaller values via smit for a trial.


Long live king Moshiach !
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top