vmstat 3

Enhasmen · Jul 4, 2001

hi,
We have 1GB RAM and we don´t understand vmstat report. For instance, vmstat report says to us that we have 110.000 active virtual page and 150 free, but this information doesn´t match with 1GB RAM. What is meaning of free? Our system hangeg and I don´t know why.

Thaks in advance.

aixqueen · Jul 4, 2001

What level of the os are you on? There were several patches for systems
hanging that were fixed in maintenance levels...especially tcpip..
Level would be important

General Info on System Hangs

Several things can cause a hang of the system, but it is important to try and
figure out what changed.

Did the system recently get updated? Are there now more users than before?
A new program? A new UPS with software? Additional users added?

Does it hang all the time? End of the month?

What happens to the console? Do all ports and telnet sessions hang? Is the
console still working? What was required to make it come back? A system
can hang for a number of reasons including: lack of paging space, running out
of space in a root filesystem (/ or /etc or /dev), not enough resources or
mbuffs, downlevel AIX software, not applying latest patches at recommended
maintenance level, memory leaks, hardware going bad, etc.

Some things that may happen When a system hangs relating to Paging
Space:
Processes requesting additional memory are killed once the system runs low
on paging space. The system appears hung as new processes and telnet
connections are terminated. Error messages such as Not enough memory or
Fork function failed are generated.

1.Add additional paging space. To know how much paging space is "enough",
use the lsps -s command often to get a feel for the %Used of the paging
space. Based on this percentage, a system at its maximum workload should
have no more than 80% of paging space used.
Example output of the command lsps -s looks like the following:
Total Paging Space Percent Used
200MB 51%
Anything over 51% is suspect, and I would consider adding paging space.

2.Systems often have plenty of paging space (sometimes 3-4 times RAM)
and can still run out. This could be due to a memory leak. The question then
is which process is causing the memory leak.
Discussed below are ways to find out what process is causing the memory
leak and the tools used to accomplish this task.

a.The command ps vg provides useful information. In this case the data in the
column labeled SIZE is needed. The SIZE column reports virtual memory
(paging space) usage on a per-process basis, in 1KB units.
Sample output from ps vg | pg looks like the following:

PID TTY STAT TIME PGIN SIZE RSS LIM TSIZ TRS %CPU
MEM COMMAND
0 - A 87:42 6 20 8 xx 0 0 0.1 0.0 swapper
1 - A 191:58 94 240 240 xx 25 28 0.3 0.0 /etc/init
516 - A 70228:47 0 16 20 xx 0 0 97.0 0.0 kproc
774 - A 5:53 1 24 28 xx 0 0 0.0 0.0 kproc
1032 - A 28:40 0 56 56 xx 0 0 0.0 0.0 kproc
1866 - A 0:00 0 24 20 xx 0 0 0.0 0.0 kproc
2174 pts/1 A 2:55 31 420 544 32768 260 164 0.0 1.0 aixterm
2454 - A 1:32 62 272 224 xx 96 60 0.0 0.0 /usr/dt/b

Collect ps vg output at different instances throughout the period of time that
%Used from lsps -s grows to 99%. The output can then be examined for large
numerical increases from the SIZE column. This process would exhibit
extraordinarily large increases in the amount of paging space it uses between
the two ps vg readings.

There is a tool that creates delta reports of ps vg over any designated period
of time. The script is called ps_ and is located in /usr/sbin/perf/pmr. It is only
available at AIX 4.1.x and above. This tool is not installed by default. The
fileset name is bos.perf.pmr and can be installed from install media. This ps_
script is run with the following syntax:
ps_ <#seconds to run>

It takes a ps vg snapshot at the beginning and end of a designated time
period and creates a delta report (final values minus initial values). The output
file for ps_ is called ps.sum and is created in /var/perf/tmp.
For example, a system user notices the %Used value from lsps -s rises from
40% to 80% in a few hours, eventually reaching 99% and freezing all activity
on the system. The user realizes that this is not normal and that there may
be a memory leak at hand. Running ps_ 600 every half hour during the time
paging space became consumed would most likely reveal the process
causing the memory leak. The following is a sample reading of ps_ (as seen
below from ps.sum):

DELTA DELTA DELTA DELTA DELTA DELTA BEFORE AFTER
PID PGIN SIZE RSS TRS DRS C TIME TIME CMD
0 0 0 0 0 0 0 10:58 10:58 swapper
1 0 0 0 0 0 -1 71:31 71:31 init
516 0 0 0 0 0 0 17136:33 17137:29 kproc
50328 1 78 -124 0 -124 1 0:00 0:00 ksh
50450 0 0 0 0 0 0 0:00 0:00 telnetd
50724 0 -20 0 0 0 0 0:29 0:29 ttsession
53746 0 0 0 0 0 0 0:00 0:00 ksh

From the DELTA SIZE column, we can see that PID 50328 allocated 78K of
paging space during the time ps_ was run. PID 50724, however, deallocated
20K of paging space during this time and any process showing zero indicates
that it allocated no paging space.

b.Another tool that can be used to track a memory leak is svmon.
NOTE: PAIDE/6000 must be installed in order use svmon (and others, such
as tprof, netpmon, and filemon). To check if this is installed, enter: lslpp -1
perfagent.tools.

If Aix 4.3.0 or higher
As root, enter the following command:
svmon -Pau 10 | more
This will list the top 10 memory consumers in decreasing order, the first
process being the largest consumer. The rest of the report shows memory
and paging space usage for each segment of each process.
Sample output looks like the following:

Pid Command Inuse Pin Pgspace
13794 dtwm 1603 1 449
Pid: 13794
Command: dtwm
Segid Type Description Inuse Pin Pgspace Address Range
b23 pers /dev/hd2:24849 2 0 0 0..1
14a5 pers /dev/hd2:24842 0 0 0 0..2
6179 work lib data 131 0 98 0..891
280a work shared library text 1101 0 10 0..65535
181 work private 287 1 341 0..310:65277..65535
57d5 pers code,/dev/hd2:61722 82 0 0 0..1

In each process report, find items in the Type column identified as work and in
the Description column identified as private, and check how many 4KB
(4096-byte)pages are used under the Pgspace column. This is the minimum
number of working pages this segment is using in all of virtual memory. A
Pgspace number that grows but never decreases may indicate a memory
leak.

3.The system may be reaching its Maximum number of PROCESSES
allowed per user, or maxuproc. Depending on what maxuproc is set to (default
is 40), if a user has already forked a number of processes equal to maxuproc,
the system will not allow that user to fork any more processes.
The maxuproc parameter can be increased via SMIT. Enter SMIT and proceed
in sequence through the panels System Environments and then Change /
Show Characteristics of the Operating System. The first line on this last
screen is maxuproc. Increasing this number by a conservative increment
(50-100 at a time) allows users to fork more processes, thus avoiding any Out
of memory or Cannot fork messages.

Check the errpt -a | more (to look for entries that may show that the system
is busy with tty overrun or hogs? Is there missing hardware?

df (is the system full?)

no -a | more and look for "thewall" what is it set at?

microcode has caused problems on some machines, check for the latest
on your machine. On the same site you can also check for the latest
patches for your AIX. Microcode Common filesets that hang
machines are bos.up, bos.mp,
bos.net.tcp,bos.rte.tty I would be sure that I had the latest and
greatest with prereqs and coreqs before I checked further.

Most patches require a still system with no one one it. You must be in
multiuser mode, but without users. Also, most patches require a reboot of the
system after they are applied.

You can run the instfix command to see what level of aix you are at:
instfix -ik 4320-02_AIX_ML
instfix -ik 4330-01_AIX_ML
if none of the 4210-0x_AIX_ML are found
your AIX level is 4.2.1.0

Get the latest patches and recommended maintenance levels for your
operating system.
Do an lppchk -v and a lppchk -c -m3 [fileset]
diag -a
Any broken filesets?
errpt -a | more
Any errors?

Try to get a more accurate history of when these hangs started and
what changed. It is really hard to track down problems. You can put script
files in cron to check the paging space, and the ps and other commands to
see what is happening before the machine hangs.....but adding recommended
maintenance and micro code will also help.
===============================

aixqueen · Jul 4, 2001

http://techsupport.services.ibm.com...mode=9&documents=090605226914708&database=aix

VMSTAT memory
The information under the memory heading provides information about real
and virtual memory.

avm
The avm column gives the average number of pages allocated to paging space.
(In AIX, a page contains 4096 bytes of data.) When a process executes,
space for working storage is allocated on the paging devices (backing store).
This can be used to calculate the amount of paging space assigned to executing
processes. The number in the avm field divided by 256 will yield the number
of megabytes (MB), systemwide, allocated to page space.

The lsps -a command also provides information on individual paging space.
It is recommended that enough paging space be configured on the system
so that the paging space used does not approach 100 percent. When
fewer than 128 unallocated pages remain on the paging devices, the
system will begin to kill processes to free some paging space.

Versions of AIX before 4.3.2 allocated paging space blocks for pages
of memory as the pages were accessed. On a large memory machine,
where the application set is such that paging is never or rarely
required, these paging space blocks were allocated but never needed.
AIX Version 4.3.2 implements deferred paging space allocation, in
which the paging space blocks are not allocated until paging is necessary,
thus, helping reduce the paging space requirements of the system.
The avm value in vmstat indicates the number of virtual memory
(working storage) pages that have been accessed but not necessarily
paged out. With the previous policy of "late page space allocation",
avm had the same definition. However, since the VMM
would allocate paging space disk blocks for each working page that
was accessed, the paging space blocks was equal to the avm. The reason
for the paging space blocks to be allocated at the time the working pages
are accessed is so that if the pages had to be paged out of memory,
there would be disk blocks on the page space lv's available for the
in-memory pages to go. On systems that never page-out to page-space,
it's a waste of disk space to have as many page space disk blocks as
there is memory. With deferred policy, the page space disk blocks
are only allocated for the pages that do need to be paged out.
The avm number will grow as more processes get started and/or
existing processes use more working storage. Likewise, the
number will shrink as processes exit and/or free working storage.

fre
The fre column shows the average number of free memory frames.
A frame is a 4096-byte area of real memory.
The system maintains a buffer of memory frames, called the free list,
that will be readily accessible when the VMM needs space. The nominal
size of the free list varies depending on the amount of real memory installed.
On systems with 64MB of memory or more, the minimum value (MINFREE)
is 120 frames. For systems with less than 64MB, the value is two
times the number of MB of real memory, minus 8. For example, a system
with 32MB would have a MINFREE value of 56 free frames.
If the fre value is substantially above the MAXFREE value (which is
defined as MINFREE plus 8), then it is unlikely that the system is thrashing
(continuously paging in and out). However, if the system is thrashing, be
assured that the fre value is small. Most UNIX and AIX operating
systems will use nearly all available memory for disk caching, so
you need not be alarmed if the fre value oscillates between MINFREE
and MAXFREE.

Enhasmen · Jul 4, 2001

Thanks you aixqueen,

the documentation explain it better than man utility,but I have a couple of questions more. How can I know the real memory free? If I use the information of vmstat report I don´t get matching (avm * 4K) + (free * 4K) <> real ram memory.

When we run iostat and vmstat utility we have watched that the system hdisk are 100 % busy when free column of vmstat report is 0. we think that is because pagging space are on system disk. Do you believe this situation is normal?

Thanks in advance

aixqueen · Jul 4, 2001

My quess would be if the system disk is 100% busy it could be because the paging space is on that disk.......and also maybe part of your application? and the root filesystems....The logfiles are another issue that causes the system to be
really busy. If you are only using hd8 and you added a bunch of stuff besides
system stuff to the rootvg...........the jfslog has to do everything, system,
and your app logs...
===============
General rule of thumb
Correct CPU, RAM and Paging Shortages
cpu bound if vmstat sys and usr constantly => 80%
If run > 2.5 and CPU bound probably need another processor
If run queue <= 2.5 and CPU bound maybe a runaway process
disk bound if vmstat wa => 40%
Lot of PI and PO likely need more RAM
Iostat - lot of activity on one disk try to spread data over multiple drives
===========================
Check for paging activity by following the instructions in the "Memory
Bottlenecks" section. Paging to and from disk will contribute to the I/O load.

2.filemon

To find out what files, logical volumes, and disks are most active, run the
following command as root:

# filemon -u -O all -o /tmp/fmon.out; sleep 30;trcstop

In 30 seconds, a report is created in /tmp/fmon.out.
Check for most active segments, logical volumes, and physical volumes in
this report.

Check for reads and writes to paging space to determine if the disk activity is
true application I/O or is due to paging activity.

Check for files and logical volumes that are particularly active. If these are on
a busy physical volume, moving some data to a less busy disk may improve
performance.

The Most Active Segments report lists the most active files by file system and
inode. The mount point of the file system and inode of the file can be used
with the ncheck command to identify unknown files:

# ncheck -i

This report is useful in determining if the activity is to a filesystem (segtype =
persistent), the JFS log (segtype = log), or to paging space (segtype =
working).

By examining the reads and read sequences counts, you can determine if the
access is sequential or random. As the read sequences count approaches
the reads count, file access is more random. The same applies to the writes
and write sequences.

To great sites to help you:

http://www.rs6000.ibm.com/doc_link/en_US/a_doc_lib/aixbman/prftungd/2365ch7.htm#47667

http://www.rs6000.ibm.com/doc_link/en_US/a_doc_lib/aixbman/prftungd/2365ch8.htm#31232

====================
General Paging Space Tips
Paging Space Tips
One Paging Space LV per PV
Avoid Paging Space LV on Heavily Active PV
Make Each Paging LV Roughly Equal In Size
Do Not Extend A Paging LV Across Multiple PV's
Best Performance Allocate Paging LV's On PV's Attached To Separate Adapters
========================
jfslog
Journaled File System Log Size Issues
Another size-related issue is the size of the JFS log. In most instances, multiple
journaled file systems use a common log configured to be 4MB in size.
For example, after initial installation, all file systems within the root
volume group use logical volume hd8 as a common JFS log. The default
logical volume partition size is 4MB, and the default log size is one partition,
therefore, the root volume group normally contains a
4MB JFS log. When file systems exceed 2GB or when the total amount
of file system space using a single log exceeds 2GB, the default log
size may not be sufficient. In either case, the log sizes should be scaled
upward as the file system size increases. The JFS log is limited to a
maximum size of 256MB. Multiple logs on different volumes, can also improve
performance.

My brain is fried today...maybe others can chime in......hopefully someone is not on
holiday today. Good Luck

ElgisRamon · Jul 5, 2001

Enhasmen, the AVM column shows the size of your working set, say, the memory all your AIX system, including your programs are using (be at real RAM and fake paging space memory), the fre column shows how many pages (4KB) of RAM are free. May be your machine is so overcharged there is no free real RAM.
I hope it works...
Unix was made by and for smart people.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

vmstat 3

Enhasmen

Technical User

aixqueen

Technical User

aixqueen

Technical User

Enhasmen

Technical User

aixqueen

Technical User

ElgisRamon

MIS

Similar threads

Part and Inventory Search

Sponsor