AIX hanging 4

marecek2 · May 7, 2007

Hi,
I have issue with IBM pseries server 9131-52A running AIX 5300-05-06.
From time to time I lost a possibility to connect to it. It refusing all connections and also if I try to log directly (using CRT monitor connected to Graphic adapter + usb keyboard and mouse) I get only something like stucked screen.
And all what can i do is to shut it by pressing On/Off button and wait for some 4 secs countdown (twice).
This machine is running Oracle database ..."

Can you help me how can I check and then avoid this situation ?

Best regards /Marek

stefanhei · May 7, 2007

Hi,
do you find anything in the error report (errpt)?

Is your paging space large enough? Depending on your configuration Oracle can be greedy about memory.

Stefan

TSch · May 8, 2007

Hi,

we had a similiar problem on one of our p-series ...

In our case we used the following workaround:

vmo -r -o vmm_mpsize_support=0
bosboot -a

If you're using LPARS and/or VIO Server on your p-series you'll have to perform these steps on every LPAR as well as every VIO Server.

There's also an APAR decribing this problem. Just search the IBM site for IY90017 ...

I'm not sure whether this is exactly the problem you're having on your machine but it's sounds extremely similar to ours. So maybe the procedure might help you ...

But if you try this be sure that NONE of your applications running on that machine require 64K pages because the command will DISABLE 64K PAGES !

By the way:

vmo -a

should provide to with the current settings, so you know what you might have to switch back to if anything goes wrong.

Can't hurt to make an mksysb in the first place before you change anything ...

Regards
Thomas

marecek2 · May 8, 2007

Hi Stefan and Thomas ...

uhm - I'm not so experienced user (but man pages can help).
I checked errpt and there are many enties like:
--------------
LABEL: PGSP_KILL
IDENTIFIER: C5C09FFA

Date/Time: Sun May 6 11:26:49 GDT 2007
Sequence Number: 524
Machine Id: 00013916D700
Node Id: oradb
Class: S
Type: PERM
Resource Name: SYSVMM

Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes
SYSTEM RUNNING OUT OF PAGING SPACE

Failure Causes
INSUFFICIENT PAGING SPACE DEFINED FOR THE SYSTEM
PROGRAM USING EXCESSIVE AMOUNT OF PAGING SPACE

Recommended Actions
DEFINE ADDITIONAL PAGING SPACE
REDUCE PAGING SPACE REQUIREMENTS OF PROGRAM(S)

Detail Data
PROGRAM
oracle
USER'S PROCESS ID:
0
PROGRAM'S PAGING SPACE USE IN 1KB BLOCKS
0
-------------

Actual output from svmon is:

# svmon -G
size inuse free pin virtual
memory 487424 483289 4135 70959 329397
pg space 131072 79012

work pers clnt
pin 70953 0 6
in use 269837 0 213452

PageSize PoolSize inuse pgsp pin virtual
s 4 KB - 449641 79012 52911 295749
m 64 KB - 2103 0 1128 2103

Can you help me with this?
What will be the best I may do.

Thanks for your help and patience with me ;-)

khalidaaa · May 8, 2007

OH! you have 2G of RAM and only 500M of paging space!?!

You have to increase your paging space to at least the same amount of RAM you have!

You have first to check how many paging space do you have! (I guess you have only hd6 but just to check do this):

lsps -a

http://publib.boulder.ibm.com/infoc...topic=/com.ibm.aix.cmds/doc/aixcmds3/lsps.htm

then you have to increase it using chps!

http://publib.boulder.ibm.com/infoc...topic=/com.ibm.aix.cmds/doc/aixcmds1/chps.htm

You need first to see the physical partition size of your paging space to increase it! So for example (for hd6) you have to do this

lsvg rootvg (look at the pp size) and for example if it was 256 then you have to do this to increase you paging space to 2G

chps -s6 hd6

I hope this is useful.

For more info:

http://publib.boulder.ibm.com/infoc...ix.baseadmn/doc/baseadmndita/pagspacovrvw.htm

Regards,
Khalid

foobar13 · May 8, 2007

This commonly happens when you run out of sockets or processes. When you're out of sockets, you obviously can't find a free port to communicate with the box. When you're out of processes, forking stops, and it's not possible to get an sshd or telnetd forked to start a session.

You can run out of sockets when there are too many in the WAIT state. netstat -na should show you how many there are. This will happen when one end of the tcp session ends without telling the other end. This often happens to web servers which need to tune the WAIT timeouts.

Running out of processes can fixed by upping the limit in /etc/security/limit for the user. I think there might be a kernel parm for it too (I'm not sitting in front of an AIX box just now).

Running out of memory can also be a problem, but don't forget that the kernel will start killing processes when memory gets too low; so you'll still be able to fork a shell (slowly though).

Other common reasons for hangs, or what appear to be hangs, are ypservers not responding, nfs servers being unavailable, and hangs in driver code for things like tape drives.

marecek2 · May 9, 2007

Hi Khalid & foobar,

Khalid - I increased size of pagigng space up to 2 Gb.
Will see how long will be oracle in running state ;-)

Foobar - I checked with netstat -na - it gives me a big list ... I'm not so crazy to paste it here ...
But if wan't to tune WAIT parameters - what I exactly need to do. You mean to tune something in AIX ? In Oracle that is running on that AIX ? or to tune clients that are connected to database and pushed some data to DB ?

Best regards / Marek

foobar13 · May 9, 2007

tcp sessions go through various states when setting up and tearing down. You can read about them in detail at wikipedia or elsewhere. Basically, a tcp session can wind up in a TIME-WAIT state for several minutes. If you have a busy server with lots of short lived connections initiated at the client, then you can run out of sockets. Oracle uses a listener to pass the connection to another process which then establishes a session back to the client, so the WAIT problem doesn't happen unless something's wrong with one of the processes. Plus, the sqlnet connections tend to be long lived and over shorter-haul networks. The thing you need to tune is the amount of time the socket stays in the WAIT state on the server. This is an AIX kernel variable, the name of which escapes me. All Unices have it.

It's far more likely that you've run out of processes and can't fork. Normally this is logged by the kernel.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

AIX hanging 4

marecek2

Technical User

stefanhei

Technical User

TSch

Technical User

marecek2

Technical User

khalidaaa

Technical User

foobar13

MIS

marecek2

Technical User

foobar13

MIS

Similar threads

Part and Inventory Search

Sponsor