Hi all,
we are running Oracle with (I guess) 2 DBs or instances on a AIX 5.2 Box, having 4 CPUs (Power4, 1200 MHz each) and 16 GB RAM.
The system ran fine until one part of the databases was transfered from an old Clariion CX700 (FC-Disks) to a brand new EMC DMX-3 (FC-Disks too).. they say. I have to believe that for now.
The problem shows up as following, when they do an Online Backup of their DBs:
- Up to 50 of kthreads in the r-queue in vmstat while about up to 5-7 in the b-queue.
- No paging space in/out occures.
- minperm% is 5 all the time
- When having lru_file_repage=0, lrud gets about 25%-60%, blocking the rest of the system so extreme, that typing on a term/shell is about impossible or lagging very bad.
- When having lru_file_repage=1 and maxperm% is 50% or 80%,
watching numperm% reaching 50% or 80%, whatever is actually set, lrud is getting busy like described above when having it at lru_file_repage=0.
- fre column on vmstat is about 300k pages while having the problems so minfree and maxfree are not touched, I think.
- Traffic on Disks is up to 30 MB/sec and doesn't look bad. - The disk-arrays themselves are in good shape.
- FS's are jfs2 and have no cio or dio activated.
- There is a veeery slow rate
We AIX admins have no insight to what is possible on the Oracle part to tune this bad behaviour.
When the Oracle DBAs are not running their tests with that online backup, the system has very low traffic but there is once and then a kthrd on the b-column in vmstat; maybe this is a sign that there is something wrong generally.
Here some output with current settings and such that might help to understand my description:
Thanks in forward!
laters
zaxxon
we are running Oracle with (I guess) 2 DBs or instances on a AIX 5.2 Box, having 4 CPUs (Power4, 1200 MHz each) and 16 GB RAM.
The system ran fine until one part of the databases was transfered from an old Clariion CX700 (FC-Disks) to a brand new EMC DMX-3 (FC-Disks too).. they say. I have to believe that for now.
The problem shows up as following, when they do an Online Backup of their DBs:
- Up to 50 of kthreads in the r-queue in vmstat while about up to 5-7 in the b-queue.
- No paging space in/out occures.
- minperm% is 5 all the time
- When having lru_file_repage=0, lrud gets about 25%-60%, blocking the rest of the system so extreme, that typing on a term/shell is about impossible or lagging very bad.
- When having lru_file_repage=1 and maxperm% is 50% or 80%,
watching numperm% reaching 50% or 80%, whatever is actually set, lrud is getting busy like described above when having it at lru_file_repage=0.
- fre column on vmstat is about 300k pages while having the problems so minfree and maxfree are not touched, I think.
- Traffic on Disks is up to 30 MB/sec and doesn't look bad. - The disk-arrays themselves are in good shape.
- FS's are jfs2 and have no cio or dio activated.
- There is a veeery slow rate
We AIX admins have no insight to what is possible on the Oracle part to tune this bad behaviour.
When the Oracle DBAs are not running their tests with that online backup, the system has very low traffic but there is once and then a kthrd on the b-column in vmstat; maybe this is a sign that there is something wrong generally.
Here some output with current settings and such that might help to understand my description:
Code:
root@srdbhv05:/oracle> vmstat -t -I 1
System Configuration: lcpu=4 mem=16384MB
kthr memory page faults cpu time
-------- ----------- ------------------------ ------------ ----------- --------
r b p avm fre fi fo pi po fr sr in sy cs us sy id wa hr mi se
6 2 0 847720 139620 1016 998 0 0 1665 5203 854 10912 1151 3 20 69 8 17:25:53
13 0 0 847664 139775 1165 1367 0 0 2462 4877 1085 9277 1075 2 80 6 12 17:25:54
2 6 0 847668 139770 1029 875 0 0 2207 3606 774 23026 848 3 52 1 44 17:25:55
34 1 0 847630 139920 1163 858 0 0 1132 1912 772 7819 541 1 80 1 18 17:25:57
38 0 0 849085 138478 517 282 0 0 1093 1706 730 14130 656 1 74 17 8 17:25:58
4 1 0 848920 138638 259 792 0 0 901 1445 688 3528 477 1 70 19 11 17:25:59
17 6 0 847644 139918 1887 1970 0 0 3738 6398 1158 21265 1925 5 75 7 12 17:26:00
2 2 0 847806 139436 2 549 0 0 804 1405 667 10602 623 1 59 16 24 17:26:01
19 1 0 848161 139395 519 415 0 0 772 1351 574 8230 466 2 55 22 20 17:26:02
45 2 0 848396 138791 1208 1364 0 0 2664 6237 827 14631 989 4 91 1 4 17:26:03
11 6 0 848015 139544 1026 802 0 0 1673 4141 766 10809 628 2 62 5 32 17:26:04
3 2 0 847649 139692 132 790 0 0 1217 2296 682 21417 722 4 62 0 34 17:26:05
Topas Monitor for host: srdbhv05 EVENTS/QUEUES FILE/TTY
Wed Nov 21 17:27:44 2007 Interval: 1 Cswitch 1019 Readch 7337.8K
Syscall 7906 Writech 6134.5K
Kernel 74.2 |##################### | Reads 251 Rawin 0
User 5.1 |## | Writes 196 Ttyout 632
Wait 19.7 |###### | Forks 7 Igets 0
Idle 1.0 |# | Execs 5 Namei 315
Runqueue 7.4 Dirblk 0
Network KBPS I-Pack O-Pack KB-In KB-Out Waitqueue 1.6
en0 151.4 161.8 155.4 66.8 84.6
lo0 2.5 15.3 15.3 1.3 1.3 PAGING MEMORY
en1 0.3 3.8 1.3 0.2 0.1 Faults 2076 Real,MB 16383
Steals 4512 % Comp 21.0
Disk Busy% KBPS TPS KB-Read KB-Writ PgspIn 0 % Noncomp 76.6
skpower0 99.3 6792.4 300.6 6527.4 265.0 PgspOut 0 % Client 76.9
hdisk27 98.0 4484.1 205.1 4346.5 137.6 PageIn 1956
skpower2 95.5 3913.4 3.8 0.0 3913.4 PageOut 2357 PAGING SPACE
hdisk0 95.5 25.5 6.4 0.0 25.5 Sios 4771 Size,MB 6912
hdisk38 95.5 2935.0 24.2 0.0 2935.0 % Used 0.7
NFS (calls/sec) % Free 99.2
Name PID CPU% PgSp Owner ServerV2 0
lrud 143430 45.7 0.1 root ClientV2 0 Press:
oracle 1163354 22.0 6.8 oracle ServerV3 0 "h" for help
sshd 2404354 1.0 0.7 root ClientV3 0 "q" to quit
aioserve 1671306 1.0 0.1 root
oracle 794768 0.6 6.8 oracle
oracle 1212514 0.6 15.5 oracle
oracle 868558 0.6 6.8 oracle
root@srdbhv05:/oracle> vmstat -v
4194304 memory pages
4008879 lruable pages
140234 free pages
3 memory pools
266928 pinned pages
80.1 maxpin percentage
5.0 minperm percentage
80.0 maxperm percentage
79.6 numperm percentage
3194452 file pages
0.0 compressed percentage
0 compressed pages
79.9 numclient percentage
80.0 maxclient percentage
3206731 client pages
0 remote pageouts scheduled
0 pending disk I/Os blocked with no pbuf
0 paging space I/Os blocked with no psbuf
2740 filesystem I/Os blocked with no fsbuf
0 client filesystem I/Os blocked with no fsbuf
0 external pager filesystem I/Os blocked with no fsbuf
root@srdbhv05:/oracle> vmo -x
cpu_scale_memp,8,8,8,1,64,,B,
data_stagger_interval,161,161,161,0,4095,4KB pages,D,lgpg_regions
defps,1,1,1,0,1,boolean,D,
force_relalias_lite,0,0,0,0,1,boolean,D,
framesets,2,2,2,1,10,,B,
htabscale,n/a,-1,-1,-4,0,,B,
kernel_heap_psize,4096,4096,4096,4096,16777216,bytes,B,lgpg_size
large_page_heap_size,0,0,0,0,9223372036854775807,bytes,B,lgpg_size
lgpg_regions,0,0,0,0,,,B,lgpg_size
lgpg_size,0,0,0,0,16777216,bytes,B,lgpg_regions
low_ps_handling,1,1,1,1,2,,D,
lru_file_repage,1,1,1,0,1,boolean,D,
lru_poll_interval,10,0,10,0,60000,milliseconds,D,
lrubucket,131072,131072,131072,65536,,4KB pages,D,
maxclient%,80,80,80,1,100,% memory,D,maxperm%
maxfree,1088,128,1088,16,204800,4KB pages,D,minfree memory_frames
maxperm,3207102,,3207102,,,,S,
maxperm%,80,80,80,1,100,% memory,D,minperm% maxclient%
maxpin,3355444,,3355444,,,,S,
maxpin%,80,80,80,1,99,% memory,D,pinnable_frames memory_frames
mbuf_heap_psize,4096,4096,4096,4096,16777216,bytes,B,
memory_affinity,1,1,1,0,1,boolean,B,
memory_frames,4194304,,4194304,,,4KB pages,S,
mempools,1,1,1,1,256,,B,
minfree,960,1080,960,8,204800,4KB pages,D,maxfree memory_frames
minperm,200443,,200443,,,,S,
minperm%,5,20,5,1,100,% memory,D,maxperm%
nokilluid,0,0,0,0,4294967295,uid,D,
npskill,13824,13824,13824,1,1769471,4KB pages,D,
npswarn,55296,55296,55296,0,1769471,4KB pages,D,
num_spec_dataseg,0,0,0,0,,,B,
numpsblks,1769472,,1769472,,,4KB blocks,S,
pagecoloring,n/a,0,0,0,1,boolean,B,
pinnable_frames,3927417,,3927417,,,4KB pages,S,
pta_balance_threshold,n/a,50,50,0,99,% pta segment,R,
relalias_percentage,0,0,0,0,32767,,D,
soft_min_lgpgs_vmpool,0,0,0,0,90,%,D,lgpg_size
spec_dataseg_int,512,512,512,0,,,B,
strict_maxclient,1,1,1,0,1,boolean,D,
strict_maxperm,0,0,0,0,1,boolean,D,
v_pinshm,0,0,0,0,1,boolean,D,
vmm_fork_policy,0,0,0,0,1,boolean,D,
Thanks in forward!
laters
zaxxon