Load on the system...

sunny1504 · Aug 17, 2006

Below are the output of prstat and mpstat, I wanted to understand the part which says "load averages: 1.12, 1.10, 1.04" from prstat

Q1) What should be the value which indicates the system usage is high and on what are these values dependent?.

Q2) This question is in regards with mpstat, please refer to the output pasted below, do I have to only look at USR, SYS and IDL to understand the cpu utilization.

Q3) What is smtx in mpstat and what does it indicates?

The only information I can make avail is "idle time (cpu id) is consistently 0 and if the system time (cpu sy) is double the user time (cpu us) system is facing shortage of CPU resources"

Please can someone help me with these queries..

#prstat
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
1855 oracle 25M 5256K sleep 59 0 43:23:10 11% tnslsnr/3
2870 oracle 23M 7264K sleep 59 0 84:04:49 5.5% opmn/20
373 root 22M 21M sleep 59 0 1:13:25 0.3% mibiisa/7
20467 root 4632K 4296K cpu1 49 0 0:00:00 0.0% prstat/1
15448 oracle 851M 715M sleep 59 0 0:19:59 0.0% java/44
2424 oracle 703M 590M sleep 59 0 0:05:04 0.0% oracle/11
8754 oracle 441M 341M sleep 59 0 0:04:03 0.0% java/43
5604 oracle 37M 2632K sleep 59 0 0:00:00 0.0% httpd/3
5934 oracle 37M 2664K sleep 59 0 0:00:01 0.0% httpd/3
375 smmsp 4376K 600K sleep 59 0 0:00:00 0.0% sendmail/1
209 root 2304K 200K sleep 59 0 0:00:00 0.0% cron/1
280 root 3208K 448K sleep 59 0 0:00:00 0.0% htt_server/2
374 root 4408K 728K sleep 59 0 0:00:00 0.0% sendmail/1
203 root 3496K 984K sleep 59 0 0:00:01 0.0% syslogd/13
173 root 2216K 440K sleep 59 0 0:00:00 0.0% lockd/2
192 root 3768K 904K sleep 59 0 0:00:00 0.0% automountd/2
2337 oracle 701M 588M sleep 59 0 0:00:01 0.0% oracle/1
5583 oracle 37M 6776K sleep 59 0 0:00:03 0.0% httpd/1
174 daemon 2504K 584K sleep 59 0 0:00:00 0.0% statd/1
137 root 2376K 616K sleep 59 0 0:00:00 0.0% keyserv/3
160 root 2504K 1032K sleep 59 0 0:00:00 0.0% inetd/1
Total: 119 processes, 823 lwps, load averages: 1.12, 1.10, 1.04

# mpstat
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 218 0 282 480 327 2009 288 510 266 0 2856 9 16 1 74
1 225 0 264 454 182 2272 406 478 359 0 3444 9 15 1 74

System information
# uname -i
SUNW,Sun-Fire-V210

OS :- 5.9
release 4/04
Memory size: 8192 Megabytes
Two processor of 1336 MHz (SUNW,UltraSPARC-IIIi)

Cheers
Sunny D'Souza

linnorm · Aug 17, 2006

I found this (

http://www.lifeaftercoffee.com/2006/03/13/unix-load-averages-explained/)

to be a good explination of the load average numbers. It boils down to the load number vs number of CPUs. Good numbers are around the number of CPUs you have or less, bad numbers go up from there.

marrow · Aug 21, 2006

Your system looks to be running OK, 1 second through the run queues at 1-5-15 minutes is quiet acceptable in my opinion. I would run "prstat -a" for a more detailed breakdown and "vmstat 2 20" at regular intervals to view the run queues. Ignore 1st line of output from vmstat, (this an average) and look at r b w

kHz · Aug 21, 2006

the load average by itself is not a useful number and you would do best to not use it.

on the vmstat look at the xcal (cross calls), csw (context switches), migr (migration), intr (interrupts), and make sure usr isn't over 70%, possibly 80%. and you don't want your sys to outrun usr because then your system is spending all its time in system calls and doesn't have any time to run the applications.

sunny1504 · Aug 23, 2006

Hi All,

The below is the output from the live machine when in production.. any suggestion how I can improve performance.

I probably think that system is out of CPU resource and the Disk c3t8d0 and c3t9d0 are on tremendous load.
The above two disk are located on JBOD (3210..) connected to sytem using scsi card..

bash-2.05# uname -a
SunOS ORAserv 5.9 Generic_118558-14 sun4u sparc SUNW,Sun-Fire-V210

load average stays at minimum as mentioned below
Total: 186 processes, 1008 lwps, load averages: 4.70, 3.36, 2.70

bash-2.05# sar -g 5 5
SunOS ORAserv 5.9 Generic_118558-14 sun4u 08/22/2006
13:03:24 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf
13:03:29 0.40 0.40 0.40 0.00 0.00
13:03:34 0.00 0.00 0.00 0.00 0.00
13:03:39 0.20 0.20 0.20 0.00 0.00
13:03:44 0.00 0.00 0.00 0.00 0.00
13:03:49 0.20 0.20 0.20 0.00 0.00
Average 0.16 0.16 0.16 0.00 0.00

bash-2.05# mpstat 5
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 225 13 1836 309 63 2475 384 385 192 1 3798 19 19 11 52
1 228 15 2090 941 700 2390 344 373 210 1 4923 22 19 11 49
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 470 26 1159 247 3 1123 306 143 53 54 3864 66 15 17 2
1 431 23 1241 767 557 971 301 124 54 30 3558 71 15 11 3
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 377 46 1298 255 2 1167 319 124 82 19 5514 58 21 20 2
1 416 30 1292 885 616 1064 346 132 63 16 6671 71 14 13 3
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 166 28 1409 412 2 3165 1809 120 90 4 13171 75 23 2 0
1 376 25 1136 1087 700 1550 571 94 101 4 11307 86 14 1 0
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 419 39 1618 441 2 1769 606 115 90 3 10191 84 15 0 0
1 203 28 1132 872 536 2136 902 82 105 1 15923 81 19 1 0
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 447 58 1410 334 2 1352 461 104 44 3 11134 80 19 1 0
1 311 38 1068 901 629 19425 18086 94 45 2 32006 84 14 2 0
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 217 20 868 309 2 1902 601 124 40 2 10978 81 14 4 1
1 272 41 1322 788 561 2022 602 127 39 2 13406 83 13 4 0
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 455 18 10438 288 2 2087 601 109 54 1 12963 76 19 4 1
1 507 24 2728 793 541 2093 759 108 43 2 12107 77 18 5 1
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 430 7 728 235 2 1979 696 102 50 2 14712 83 15 1 0
1 694 38 3068 779 566 1703 636 75 48 2 9126 77 22 1 0

Below is the Output of prstat -t

NPROC USERNAME SIZE RSS MEMORY TIME CPU
47 liondev 6034M 2558M 7.1% 46:02:45 29%
61 oracle 37G 32G 90% 16:25:54 15%
14 lionbld 1519M 645M 1.8% 0:06:16 10%

Below is the output of iostat...

extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
4.4 198.4 34.3 1216.4 0.0 10.5 0.0 51.5 0 91 d19
4.2 197.6 32.1 1201.8 0.0 8.8 0.0 43.7 0 81 d20
8.6 198.2 66.5 1216.2 0.0 11.8 0.1 57.0 1 99 d25
4.4 198.4 34.3 1216.4 0.0 10.4 0.0 51.5 0 91 c3t8d0
4.4 198.4 34.3 1216.4 0.0 10.4 0.0 51.5 0 91 c3t8d0s1
4.2 197.6 32.1 1201.8 0.0 8.8 0.0 43.6 0 80 c3t9d0
4.2 197.6 32.1 1201.8 0.0 8.8 0.0 43.6 0 80 c3t9d0s1

Below is th output of metastat
d25 -m d19 d20 1
d19 1 1 c3t8d0s1 -h hsp001
d20 1 1 c3t9d0s1 -h hsp001

marrow · Aug 23, 2006

Hello Sunny154,
What speed are CPU's? How many databases are you running, how many processes. Please also display a vmstat.
"prstat -t" only shows user CPU totalling 54%, busy but not busted on this snapshot, although it seems a process in liondev maybe the culprit.

However mpstat shows that you have totally run out of CPU power for this system. Could be that a V210 (in my view) is too small a machine for your requirements. Our site use V210 (single 1002MHz) for development only, with V240's (2 x 1280Mhz) dev & oracle. Live V440 (4 x 1280Mhz). It all depends on your workload of course, but it looks like you are under-powered, although memory is OK. You would need to know what process is taking most resources, monitor "prstat -a" to find out. Or old solaris cmd
/usr/ucb/ps -aux|head -10

kHz · Aug 23, 2006

nobody can tell you how to improve the performance of your server based on ONE snippet of information. to do so is asking for potential trouble for you. if you don't know and don't have the experience then hire Sun or a consultant.

i say that over and over, yet some people just have to give recommendations based on 5 seconds of output which is impossible to do. i gave you some generalities to look for, now your job is to buy a book and read to understand that or hire someone to do it for you.

your going to trust a production server with information from an anonymous poster whom you have no recourse from if something happens badly. and i can almost guarantee you would have different answers from different people. so whose advice would you follow?

marrow · Aug 24, 2006

Hello Sunny154,

Whats the news? KHz is right regarding "one" snippet of info, but by all means look at your system over several days / weeks & busy periods until you have a better picture. And obviously don't jump off a brigde because somebody has suggested that's the answer.

Keep posting though, most people happy to help, if they can.

Good Luck

Marrow

sunny1504 · Aug 24, 2006

Hi Marrow,

I do agree with KHZ but am not expert with performance part but now I am trying to understand the entire funda..

I am observing this performance degradation since last couple of days.
Do agree with you that v210 is entirely a problem.

I got 2 processor of 1336 MHz, UltraSPARC-IIIi.
Memory size: 8GB
OS:- 5.9.
There are currently three databases running.

I am pasting some outputs here..

bash-2.05# uptime
2:48pm up 8 day(s), 1:30, 8 users, load average: 3.89, 4.67, 4.48

bash-2.05# vmstat 10 5
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr m1 m1 m1 m1 in sy cs us sy id
1 0 0 14138728 1240088 96 466 350 55 55 0 3 4 4 1 1 1251 5305 4824 21 20 59
1 0 0 13527008 653904 332 776 60 33 33 0 0 0 0 1 1 666 9837 1781 46 12 42
5 0 0 13503536 628568 137 950 2 36 36 0 0 0 0 1 1 553 6838 1627 46 28 26
12 0 0 13499008 622480 179 1407 2 38 38 0 0 4 4 1 1 675 16479 2446 35 64 1
12 0 0 13499064 624168 73 454 0 3 3 0 0 0 0 0 0 1705 9970 2162 44 56 0

tin tout
0 425
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 d11
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 d12
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 d13
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 d14
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 d15
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 d16
0.1 0.0 0.8 0.0 0.0 0.0 0.0 9.2 0 0 d17
0.2 0.0 1.6 0.0 0.0 0.0 0.0 8.8 0 0 d18
0.2 39.4 1.6 366.2 0.0 0.4 0.0 10.1 0 23 d19
0.1 39.4 0.8 366.2 0.0 0.3 0.0 7.7 0 18 d20
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 d21
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 d22
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 d23
0.3 0.0 2.4 0.0 0.0 0.0 0.0 8.9 0 0 d24
0.3 39.4 2.4 366.2 0.0 0.4 0.1 11.0 0 24 d25
0.0 0.4 0.0 0.2 0.0 0.0 0.0 4.3 0 0 c1t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t0d0s0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t0d0s1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t0d0s2
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t0d0s3
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t0d0s4
0.0 0.3 0.0 0.1 0.0 0.0 0.0 4.9 0 0 c1t0d0s5
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t0d0s6
0.0 0.1 0.0 0.0 0.0 0.0 0.0 2.7 0 0 c1t0d0s7
0.0 0.1 0.0 0.0 0.0 0.0 0.0 10.1 0 0 c1t1d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t1d0s0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t1d0s1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t1d0s2
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t1d0s3
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t1d0s4
0.0 0.1 0.0 0.0 0.0 0.0 0.0 10.1 0 0 c1t1d0s5
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t1d0s6
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c1t1d0s7
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0
0.3 39.5 2.4 366.3 0.0 0.4 0.0 10.1 0 23 c3t8d0
0.1 0.0 0.8 0.0 0.0 0.0 0.0 9.1 0 0 c3t8d0s0
0.2 39.4 1.6 366.2 0.0 0.4 0.0 10.1 0 23 c3t8d0s1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t8d0s2
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t8d0s3
0.0 0.1 0.0 0.0 0.0 0.0 0.0 21.5 0 0 c3t8d0s6
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t8d0s7
0.3 39.5 2.4 366.3 0.0 0.3 0.0 7.7 0 18 c3t9d0
0.2 0.0 1.6 0.0 0.0 0.0 0.0 8.8 0 0 c3t9d0s0
0.1 39.4 0.8 366.2 0.0 0.3 0.0 7.7 0 18 c3t9d0s1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t9d0s2
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t9d0s3
0.0 0.1 0.0 0.0 0.0 0.0 0.0 11.9 0 0 c3t9d0s6
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t9d0s7
0.0 0.3 0.0 0.1 0.0 0.0 0.0 17.1 0 1 c3t10d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t10d0s0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t10d0s1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t10d0s2
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t10d0s3
0.0 0.3 0.0 0.1 0.0 0.0 0.0 17.1 0 1 c3t10d0s6
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t10d0s7
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 rmt/0

bash-2.05# mpstat 10 4
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 232 13 2291 308 64 2472 408 363 184 1 5220 20 20 11 50
1 235 15 2757 943 708 2351 356 352 201 1 91 22 20 10 47
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 216 0 720 139 2 986 135 180 79 0 25585 14 21 2 64
1 194 0 1017 466 269 1000 188 181 89 0 50293 27 28 1 43
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 135 0 947 121 2 1734 129 349 146 0 6579 8 10 3 79
1 85 0 752 395 245 1784 84 337 152 0 6221 8 10 3 80
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 110 0 667 106 2 1733 105 221 80 0 7887 15 8 10 68
1 60 0 615 1074 930 1569 65 212 83 0 6869 15 8 10 67

bash-2.05# /usr/ucb/ps -aux | head -10
USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND
oracle 400 7.1 0.125592 4104 ? O Aug 16 1252:00 /vg_anup1/app/orac
liondev 2010 2.1 0.23767213272 ? O Aug 16 3829:19 /export/home/anupd
lionweb 2058 0.6 2.5250648194280 ? S Aug 16 163:41 /export/home/anupd
root 3 0.5 0.0 0 0 ? S Aug 16 156:55 fsflush
root 1295 0.2 0.22255216528 ? S Aug 16 24:00 mibiisa -r -p 3285
root 14086 0.1 0.0 1168 864 pts/10 O 14:55:47 0:00 /usr/ucb/ps -aux
liondev 3377 0.1 3.1356840246712 ? S Aug 16 13:06 /export/home/anupd
oracle 763 0.1 7.5727256602720 ? S Aug 16 4:57 ora_ckpt_anupDEV1
root 13424 0.1 0.1 2736 2112 pts/10 S 14:33:44 0:00 -bash

I really appreciate your time..

SUnny1504

marrow · Aug 24, 2006

Hello Sunny154,
Your vmstat shows jobs queuing in the run queue which is an indication of a holdup regarding CPU power, your scan rates, swap and physical memory are all OK. The value of 12, I don't think is the end of the world, this figure would need to be a lot higher and consistant to really worry, vmstat should be run a bit longer maybe.

Unfortunately the mpstat & ps cmd show nothing outrageous: 7% is the highest CPU process, so it wasn't running when the system peaked and hot zero CPU. You need to keep looking or set up a script that captures some stats over a period of time until you find who or what is taking the power. "prstat -a" will always show the highest CPU first. Do you know when the busy times actually are ?

marrow · Aug 24, 2006

http://www.adminschoice.com/docs/iostat_vmstat_netstat.htm

some guidelines on vmstat & iostat

kHz · Aug 24, 2006

If it is running nothing but databases then you will want to max memory to close to 100% or 90% if you don't want 100%. The reason is paging. You don't want the database to have to go to disk to get information. Look there first.

You have 2 procs and you can have anywhere from 3-5 times the number of cpus in the run queue before it becomes a problem. But that is over say a 5 second interval. You show two, but what happened after that?

Your sr to fr rate can be 4:1. Meaning it scans 4 pages to free 1. If it was constantly at say 100 to 1 then you have a potential problem.

Your usr is below 80% (or 70%) which is good, but your sys is equal to or greater than usr at times (not good) and sys is over 20% (also not good).

Look at the intr and you will see that proc 1 is taking way more than proc 0. system calls on one mpstat iteration was abnormally high compared to others.

Adrian Cockroft has a book on database performance tuning for Solaris. It would be a good investment for you to learn from. Probably $35 or so.

kHz · Aug 24, 2006

I was mistaken, the book is "Configuring and Tuning Databases on the Solaris Platform" by Allan Packer, $40.27 on amazon.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Load on the system...

sunny1504

Technical User

linnorm

Technical User

marrow

Technical User

kHz

MIS

sunny1504

Technical User

marrow

Technical User

kHz

MIS

marrow

Technical User

sunny1504

Technical User

marrow

Technical User

marrow

Technical User

kHz

MIS

kHz

MIS

Similar threads

Part and Inventory Search

Sponsor