Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

What about page faults and free memory?

Status
Not open for further replies.

flpgdt

Technical User
Dec 15, 2009
6
ES
Hello,

I've been reading your forums for quite a while and the great amount of information I find here always come in hand.This time however, I need some specific help...

I have a doubt with an AIX server which I'm failing to understand as I'm new to its concept of memory management...

Straight to the point, I have a sever which yields high number of page faults even though it has plenty of available memory.

This server runs a file-reading-intensive program and an oracle database. I have no serious performance problems so far, but these page faults started to worry me as we plan to stuff some more tasks into this 40GB men server.

I first went on and did my homework reading about AIX VMM (this is my first time with AIX servers ) and got a glance of it's peculiar way of paging everything files and programs and the way it uses a deamon to steal and clean whenever it runs short. Well... I come from Solaris, where short memory causes pages faults. Page faults causes scan-rate and scan-rate most-likely means paging. When I issue my vmstat and see the 'sr' column with 4 digit numbers in a production server, it feels... wrong...

Now, I'll past below my stats.. could someone give me more perspective of what I'm seeing?

System (from nmon startup):
[tt]

¦ 6 - CPUs currently ¦
¦ 6 - CPUs configured ¦
¦ 1900 - MHz CPU clock rate ¦
¦ PowerPC_POWER5 - Processor ¦
¦ 64 bit - Hardware ¦
¦ 64 bit - Kernel ¦
¦ Dynamic - Logical Partition ¦
¦ 5.3.7.1 ML07 - AIX Kernel Version ¦
[/tt]

[tt]
$ vmstat 10 5
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
3 1 6892117 4855669 0 1 0 543 3695 0 3333 26670 12425 82 5 13 0 2.30 91.9
5 0 6892111 4855614 0 0 0 439 3895 0 3245 74660 12366 77 13 10 0 2.37 94.9
3 0 6891110 4856689 0 0 0 635 4806 0 3129 44884 12170 80 6 14 0 2.29 91.4
4 0 6891517 4856185 0 0 0 504 4241 0 3208 41366 13178 80 6 14 0 2.29 91.5
3 0 6891105 4856693 0 0 0 388 2059 0 3162 27696 13502 82 5 12 0 2.33 93.1[/tt]

[tt]
$ vmstat -v
12582896 memory pages
11905361 lruable pages
4858699 free pages
1 memory pools
3055581 pinned pages
80.0 maxpin percentage
10.0 minperm percentage
20.0 maxperm percentage
19.9 numperm percentage
2379986 file pages
0.0 compressed percentage
0 compressed pages
19.9 numclient percentage
20.0 maxclient percentage
2379986 client pages
0 remote pageouts scheduled
5114 pending disk I/Os blocked with no pbuf
151149 paging space I/Os blocked with no psbuf
2484 filesystem I/Os blocked with no fsbuf
41094 client filesystem I/Os blocked with no fsbuf
8101 external pager filesystem I/Os blocked with no fsbuf
0 Virtualized Partition Memory Page Faults
0.00 Time resolving virtualized partition memory page faults[/tt]

nmon shot (memory and paging):
[tt]
¦ Memory ------------------------------------------------------------------------¦
¦ Physical PageSpace | pages/sec In Out | FileSystemCache ¦
¦% Used 62.6% 40.5% | to Paging Space 0.0 0.0 | (numperm) 18.7% ¦
¦% Free 37.4% 59.5% | to File System 0.0 207.8 | Process 21.6% ¦
¦MB Used 30779.8MB 7405.8MB | Page Scans 0.0 | System 22.3% ¦
¦MB Free 18372.1MB 10898.2MB | Page Cycles 0.0 | Free 37.4% ¦
¦Total(MB) 49151.9MB 18304.0MB | Page Steals 0.0 | ------ ¦
¦ | Page Faults 5474.0 | Total 100.0% ¦
¦------------------------------------------------------------ | numclient 18.7% ¦
¦Min/Maxperm 4651MB( 9%) 9301MB( 19%) <--% of RAM | maxclient 18.9% ¦
¦Min/Maxfree 960 1088 Total Virtual 65.9GB | User 35.7% ¦
¦Min/Maxpgahead 2 8 Accessed Virtual 27.0GB 41.0% Pinned 24.3% ¦
¦ ¦
¦ Paging-Space ------------------------------------------------------------------¦
¦ Volume-Group PagingSpace-Name Type LPs MB Used IOpending ¦
¦ rootvg hd6 LV 128 4096 60% 0 Active Auto ¦
¦ rootvg paging00 LV 126 4032 60% 0 Active Auto ¦
¦ rootvg paging01 LV 318 10176 25% 0 Active Auto ¦
¦--------------------------------------------------------------------------------¦
[/tt]

topas shot:
[tt]
Tue Dec 15 13:45:24 2009 Interval: 2 Cswitch 11920 Readch 3124.7K
Syscall 29015 Writech 2256.6K
Kernel 9.4 |### | Reads 703 Rawin 1
User 82.4 |######################## | Writes 326 Ttyout 238
Wait 0.0 | | Forks 3 Igets 0
Idle 8.2 |### | Execs 3 Namei 2381
Physc = 2.40 %Entc= 96.1 Runqueue 4.5 Dirblk 0
Waitqueue 0.0
Network KBPS I-Pack O-Pack KB-In KB-Out
en4 6243.7 5640.5 1053.0 6179.9 63.7 PAGING MEMORY
lo0 0.0 0.0 0.0 0.0 0.0 Faults 3226 Real,MB 49151
Steals 0 % Comp 42.2
Disk Busy% KBPS TPS KB-Read KB-Writ PgspIn 0 % Noncomp 18.8
hdisk3 2.0 1.7K 28.5 0.0 1.7K PgspOut 0 % Client 18.8
hdisk23 1.5 512.9 4.0 0.0 512.9 PageIn 3
hdisk14 0.5 172.3 35.1 0.0 172.3 PageOut 560 PAGING SPACE
hdisk2 0.0 0.0 0.0 0.0 0.0 Sios 563 Size,MB 18304
% Used 40.0
Name PID CPU% PgSp Owner NFS (calls/sec) % Free 60.0
java 1208454 68.5 144.4 util ServerV2 0
syncd 348408 4.8 0.5 root ClientV2 0 Press:
java 1364098 1.1 60.0 root ServerV3 0 "h" for help
topas 577692 0.0 2.0 util ClientV3 0 "q" to quit
[/tt]

I appreciate any feedback!

cheers!

f.
 
Your system looks fine to me from this snap-shot, it isn't generally a good idea to have different sizes of paging 2*4096 and one at 10176.

I'd use nmon to collect stats over a longer period, to get a better idea of what the servers doing.



Also well worth a read to understand tuning is Driving the Power of AIX by Ken Milberg.



Mike

"Whenever I dwell for any length of time on my own shortcomings, they gradually begin to seem mild, harmless, rather engaging little things, not at all like the staring defects in other people's characters."
 
Hey, thanks!

Yeah, I'd agree the system looks fine.. as I wrote, I couldn't really say there's any performance impact. And after all, I can see now that all the PF's aren't causing a lot of I/O:

[tt]
kthr memory page faults cpu
-------- ----------- ------------------------ ------------ -----------------------
r b p avm fre fi fo pi po fr sr in sy cs us sy id wa pc ec
5 0 0 11640858 130272 3 176 0 0 0 0 1969 14780 8209 78 18 4 0 2.49 99.4
6 0 0 11640981 129758 3 216 0 0 0 0 2118 13378 6155 74 23 3 0 2.49 99.7
6 1 0 11640952 129454 0 863 0 0 0 0 2372 13326 4965 73 24 3 0 2.48 99.1
5 0 0 11641176 128860 12 133 0 0 0 0 1906 54047 6640 73 24 3 0 2.48 99.1
4 0 0 11640962 128620 11 208 0 0 0 0 2657 54259 8458 77 19 4 0 2.49 99.5
[/tt]


I just wish I could tell the exact reason for the big figure on Page Faults.

I was reading some more and pinging other forums and it seems that is about a poor configured VMM

[tt]
80.0 maxpin percentage
10.0 minperm percentage
20.0 maxperm percentage
19.9 numperm percentage
[/tt]

While I can see that 20% maxperm, would drive my LRU nut trying to free more space in a memory intense environment, I cant really grasp how it causes so many page faults...
 
ahh aix VMM threads, my favorite.. mostly because with each new thread I usually learn something.

my 2 cents on good URL's to read would be this Overview of AIX page replacement:

now, as far as why you're getting so many faults.. for me I've always noticed that my AIX servers(which are mostly databases) always have a relatively high number of page faults.. the interesting part of page faults are not soft faults but hard faults(the odio/s column in sar -r). these are faults that actually result in an IO which is otherwise known as your cache hit/miss ratio. this is the number i've always concerned myself with..

i recently started graphing this ratio in cacti on several of our large systems and it's actually a little bit fascinating.. I'm not quite sure how to take the data yet but always interesting to see stats trended and graphed over a long period of time.
 
This is ok too


Mike

"Whenever I dwell for any length of time on my own shortcomings, they gradually begin to seem mild, harmless, rather engaging little things, not at all like the staring defects in other people's characters."
 
Hey!

Thanks for the documents! I was familiar with most of it already, but didn't post here cos I wasn't sure if linking to other forums was allowed here :) Thanks anyway!

exsnafu, I'd agree with you. And to be honest I can't really touch the config flags. I'm more of application/dev level, so I'm trying to figure out what I could tell from my pov what could be done.
For now, I can only report what I've found to be 'normal' according to the current settings, and forward the white book as a guide line.
Besides, these flags seem to have been changed already. I guess (hope) the guys doing it had reason and knowledge for that.

A guy from the IBM forum following the same tread said that I probably wont be able to figure out the page faults without deep knowledge of what's running. That's somewhat... unsaddling.. I mean, would that mean that to fine tune my memory manager, I have to be a specialist in everything that runs in a server plus the server itself? I'd prefer to believe in a JVM-like world, where I can monitor and test GC's statistics and figure out my settings without having to know a lot of what or how does the program does...

(the tread:
)
 
Quick question are you using AIO

aioo -a



Mike

"Whenever I dwell for any length of time on my own shortcomings, they gradually begin to seem mild, harmless, rather engaging little things, not at all like the staring defects in other people's characters."
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top