Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Big problem with Oracle 10g on AIX

Status
Not open for further replies.

zaxxon

MIS
Dec 12, 2001
226
DE
Hi all,

we are running Oracle with (I guess) 2 DBs or instances on a AIX 5.2 Box, having 4 CPUs (Power4, 1200 MHz each) and 16 GB RAM.
The system ran fine until one part of the databases was transfered from an old Clariion CX700 (FC-Disks) to a brand new EMC DMX-3 (FC-Disks too).. they say. I have to believe that for now.

The problem shows up as following, when they do an Online Backup of their DBs:
- Up to 50 of kthreads in the r-queue in vmstat while about up to 5-7 in the b-queue.
- No paging space in/out occures.
- minperm% is 5 all the time
- When having lru_file_repage=0, lrud gets about 25%-60%, blocking the rest of the system so extreme, that typing on a term/shell is about impossible or lagging very bad.
- When having lru_file_repage=1 and maxperm% is 50% or 80%,
watching numperm% reaching 50% or 80%, whatever is actually set, lrud is getting busy like described above when having it at lru_file_repage=0.
- fre column on vmstat is about 300k pages while having the problems so minfree and maxfree are not touched, I think.
- Traffic on Disks is up to 30 MB/sec and doesn't look bad. - The disk-arrays themselves are in good shape.
- FS's are jfs2 and have no cio or dio activated.
- There is a veeery slow rate

We AIX admins have no insight to what is possible on the Oracle part to tune this bad behaviour.
When the Oracle DBAs are not running their tests with that online backup, the system has very low traffic but there is once and then a kthrd on the b-column in vmstat; maybe this is a sign that there is something wrong generally.

Here some output with current settings and such that might help to understand my description:

Code:
root@srdbhv05:/oracle> vmstat -t -I 1
System Configuration: lcpu=4 mem=16384MB
  kthr      memory             page              faults          cpu        time
-------- ----------- ------------------------ ------------ ----------- --------
 r  b  p   avm   fre  fi  fo  pi  po  fr  sr   in   sy  cs us sy id wa hr mi se
 6  2  0 847720 139620 1016 998   0   0 1665 5203 854 10912 1151  3 20 69  8 17:25:53
13  0  0 847664 139775 1165 1367   0   0 2462 4877 1085 9277 1075  2 80  6 12 17:25:54
 2  6  0 847668 139770 1029 875   0   0 2207 3606 774 23026 848  3 52  1 44 17:25:55
34  1  0 847630 139920 1163 858   0   0 1132 1912 772 7819 541  1 80  1 18 17:25:57
38  0  0 849085 138478 517 282   0   0 1093 1706 730 14130 656  1 74 17  8 17:25:58
 4  1  0 848920 138638 259 792   0   0 901 1445 688 3528 477  1 70 19 11 17:25:59
17  6  0 847644 139918 1887 1970   0   0 3738 6398 1158 21265 1925  5 75  7 12 17:26:00
 2  2  0 847806 139436   2 549   0   0 804 1405 667 10602 623  1 59 16 24 17:26:01
19  1  0 848161 139395 519 415   0   0 772 1351 574 8230 466  2 55 22 20 17:26:02
45  2  0 848396 138791 1208 1364   0   0 2664 6237 827 14631 989  4 91  1  4 17:26:03
11  6  0 848015 139544 1026 802   0   0 1673 4141 766 10809 628  2 62  5 32 17:26:04
 3  2  0 847649 139692 132 790   0   0 1217 2296 682 21417 722  4 62  0 34 17:26:05




Topas Monitor for host:    srdbhv05             EVENTS/QUEUES    FILE/TTY
Wed Nov 21 17:27:44 2007   Interval:  1         Cswitch    1019  Readch  7337.8K
                                                Syscall    7906  Writech 6134.5K
Kernel   74.2   |#####################       |  Reads       251  Rawin         0
User      5.1   |##                          |  Writes      196  Ttyout      632
Wait     19.7   |######                      |  Forks         7  Igets         0
Idle      1.0   |#                           |  Execs         5  Namei       315
                                                Runqueue    7.4  Dirblk        0
Network  KBPS   I-Pack  O-Pack   KB-In  KB-Out  Waitqueue   1.6
en0     151.4    161.8   155.4    66.8    84.6
lo0       2.5     15.3    15.3     1.3     1.3  PAGING           MEMORY
en1       0.3      3.8     1.3     0.2     0.1  Faults     2076  Real,MB   16383
                                                Steals     4512  % Comp     21.0
Disk    Busy%     KBPS     TPS KB-Read KB-Writ  PgspIn        0  % Noncomp  76.6
skpower0 99.3   6792.4   300.6  6527.4   265.0  PgspOut       0  % Client   76.9
hdisk27  98.0   4484.1   205.1  4346.5   137.6  PageIn     1956
skpower2 95.5   3913.4     3.8     0.0  3913.4  PageOut    2357  PAGING SPACE
hdisk0   95.5     25.5     6.4     0.0    25.5  Sios       4771  Size,MB    6912
hdisk38  95.5   2935.0    24.2     0.0  2935.0                   % Used      0.7
                                                NFS (calls/sec)  % Free     99.2
Name            PID  CPU%  PgSp Owner           ServerV2       0
lrud         143430  45.7   0.1 root            ClientV2       0   Press:
oracle      1163354  22.0   6.8 oracle          ServerV3       0   "h" for help
sshd        2404354   1.0   0.7 root            ClientV3       0   "q" to quit
aioserve    1671306   1.0   0.1 root
oracle       794768   0.6   6.8 oracle
oracle      1212514   0.6  15.5 oracle
oracle       868558   0.6   6.8 oracle




root@srdbhv05:/oracle> vmstat -v
              4194304 memory pages
              4008879 lruable pages
               140234 free pages
                    3 memory pools
               266928 pinned pages
                 80.1 maxpin percentage
                  5.0 minperm percentage
                 80.0 maxperm percentage
                 79.6 numperm percentage
              3194452 file pages
                  0.0 compressed percentage
                    0 compressed pages
                 79.9 numclient percentage
                 80.0 maxclient percentage
              3206731 client pages
                    0 remote pageouts scheduled
                    0 pending disk I/Os blocked with no pbuf
                    0 paging space I/Os blocked with no psbuf
                 2740 filesystem I/Os blocked with no fsbuf
                    0 client filesystem I/Os blocked with no fsbuf
                    0 external pager filesystem I/Os blocked with no fsbuf




root@srdbhv05:/oracle> vmo -x
cpu_scale_memp,8,8,8,1,64,,B,
data_stagger_interval,161,161,161,0,4095,4KB pages,D,lgpg_regions
defps,1,1,1,0,1,boolean,D,
force_relalias_lite,0,0,0,0,1,boolean,D,
framesets,2,2,2,1,10,,B,
htabscale,n/a,-1,-1,-4,0,,B,
kernel_heap_psize,4096,4096,4096,4096,16777216,bytes,B,lgpg_size
large_page_heap_size,0,0,0,0,9223372036854775807,bytes,B,lgpg_size
lgpg_regions,0,0,0,0,,,B,lgpg_size
lgpg_size,0,0,0,0,16777216,bytes,B,lgpg_regions
low_ps_handling,1,1,1,1,2,,D,
lru_file_repage,1,1,1,0,1,boolean,D,
lru_poll_interval,10,0,10,0,60000,milliseconds,D,
lrubucket,131072,131072,131072,65536,,4KB pages,D,
maxclient%,80,80,80,1,100,% memory,D,maxperm%
maxfree,1088,128,1088,16,204800,4KB pages,D,minfree memory_frames
maxperm,3207102,,3207102,,,,S,
maxperm%,80,80,80,1,100,% memory,D,minperm% maxclient%
maxpin,3355444,,3355444,,,,S,
maxpin%,80,80,80,1,99,% memory,D,pinnable_frames memory_frames
mbuf_heap_psize,4096,4096,4096,4096,16777216,bytes,B,
memory_affinity,1,1,1,0,1,boolean,B,
memory_frames,4194304,,4194304,,,4KB pages,S,
mempools,1,1,1,1,256,,B,
minfree,960,1080,960,8,204800,4KB pages,D,maxfree memory_frames
minperm,200443,,200443,,,,S,
minperm%,5,20,5,1,100,% memory,D,maxperm%
nokilluid,0,0,0,0,4294967295,uid,D,
npskill,13824,13824,13824,1,1769471,4KB pages,D,
npswarn,55296,55296,55296,0,1769471,4KB pages,D,
num_spec_dataseg,0,0,0,0,,,B,
numpsblks,1769472,,1769472,,,4KB blocks,S,
pagecoloring,n/a,0,0,0,1,boolean,B,
pinnable_frames,3927417,,3927417,,,4KB pages,S,
pta_balance_threshold,n/a,50,50,0,99,% pta segment,R,
relalias_percentage,0,0,0,0,32767,,D,
soft_min_lgpgs_vmpool,0,0,0,0,90,%,D,lgpg_size
spec_dataseg_int,512,512,512,0,,,B,
strict_maxclient,1,1,1,0,1,boolean,D,
strict_maxperm,0,0,0,0,1,boolean,D,
v_pinshm,0,0,0,0,1,boolean,D,
vmm_fork_policy,0,0,0,0,1,boolean,D,


Thanks in forward!

laters
zaxxon
 
Hi Zaxxon,

I can see that your disks are above 95% utilized. So i think the problem is with the disks! (Though your memory needs to be better tuned - oracle has its own caching so no need for maxperm to be 80% - but it is not the problem for now)

Have a look at the LTG size of your new disks' VG! Use this document for guidence:


More on AIX tuning for DBs:



Regards,
Khalid
 
Hi Khalid,
yes, the disks are somewhat busy, but there are 3 different kinds of disks, which are the built in disks of the rootvg, the disks from the CX700 and the DMX-3. All together have this high busy time. It can't be a problem with the disks themselves I think.

I have read several tuning guides for Oracle on AIX, but there was nothing, that really helped so far. We didn't try everything but our usual settings for DB servers is to have
- minperm%=5
- maxperm%=80
- maxclient%=89
- lru_file_repage=0
- max aioservers at 100
- min aioservers at 5

This machine has now max aioservers at 400 which is 1600 for overall with 4 CPUs and is using 173.

This is usually absolutely sufficient to have them run smooth since lrud is running when needed and the 80% is somewhat secondary since AIX 5.2.
This DB server is kinda "special" and has no paging problem as can be seen.
The settings I have listed for VMM are just the current. In my description is listed, that we tried a lot of different settings and none worked.

Just saw that the last point with the "veeery slow rate"... I don't remember what I was about to write lol, sorry. Too much disturbance in between.

For the blocked fsbufs, this is not a growing value and has been tuned too. So no worries on this side.

laters
zaxxon
 
Did you check the LTG size of the new disks? There might be different from the old ones!

I'm not saying that you have problem with memory but as for our machines (recommended by IBM consultant) setting are as follows:

lru_file_repage = 1
maxclient% = 8
maxperm% = 8
minperm% = 3

As he says that oracle has its own file caching so you don't need to set max/minperm high!

Regards,
Khalid
 
minperm% isn't high with 5%. maxperm% is high because lru_file_repage is 0. Since AIX 5.2 it is usually done that way, several tuning guides say. Never mind, as said I tried a lot, and it doesn't help.
We noticed that the bad behaviour exists when Oracle's RMAN backup is running.
Stopping all applications on the box brings normal behaviour, ie. no stress with lrud, when you copy lots of data from one disk to another.

I guess it tends to the applications then but in our company a lot of "political" interesset and personal pride of people hinders investigation ;) Nvm, thanks so far.

laters
zaxxon
 
I believe it would be the new storage configuration issue. You can simply check with the storage and AIX implementator about below info, because it would affect the applications performance greatly.

1) How many storage controllers in the storage? what is the controller type? FC, SAS, SCSI,
2) Well, if they say they are 2 x FC, then how does the storage LUN assigned to the controllers? are the well assigned with different controllers to best utilize them?
3) On AIX system level, does any AIX stripping configured to allow read/write on the disk volume into two or more physical volume, so that to utilize two controller most of the time.

You can also check the difference with the old storage configuration, and new one.

Gavin
 
Hi,

we are using Powerpath and have... 2 FC-Controllers handling the traffic.

Currently it is running very smooth - we mounted the FS'es with the "cio" option and in the ora.init "FILESYSTEMIO_OPTIONS=setall" as advised in some of the tuning guides and now it uses all available AIO-servers it can get, 2000 (5 CPUs now and 400 maxservers). About no blocked kthread, low I/O wait and lrud has nothing to do. Strange but works.
Seems that more parallelisation for I/O is it. Though we don't know yet why this all happened. Must have been something different than just putting some FS on another storage system.

Thanks for the help all.

laters
zaxxon
 
Take a look at enabling and tuning aio

Mike

"Whenever I dwell for any length of time on my own shortcomings, they gradually begin to seem mild, harmless, rather engaging little things, not at all like the staring defects in other people's characters."
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top