Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

I need Process monitor AIX 433

Status
Not open for further replies.

jpn1

Technical User
Jul 9, 2007
34
US
We have an AIX 4.3.3 system that just goes offline. The network dies and so does the console. So you have no way of connecting to the box. or any apps running on that box. IBM hardware says everything is fine and that it must be an app process spawning out of control, eating all resources.

1. Is there some kind of log that will show me what is happening to the system when it goes offline

2. Is there a monitor that will log my process tree so if it happens again I can see what was out of control. Or maybe even notify me when a process reaches a certain threshold.

Thanks for any insight.
JNelson 214-797-8989
jeffrey.nelson@retalix.com
 
Memory exhaustion usually manifests itself as you describe: are you able to establish a TCP connection initially that gets dropped without any sort of visible connection? If so, that is consistent as well.

Is your paging space tuned appropriately for the memory size? If paging space is badly undersized, it can cause the same symptoms.

Apart from that, the best thing I can suggest would be to run a vmstat out to a log to get see if you're running out of memory, and analyze it after the next failure.
 
I have Big brother running on this particular box and have an active desktop windows pointing at it. All of a sudden I get a "page can't be displayed" in my window.

I open a browser and try to hit the apache server running at the ip address and I get "Page can't be displayed"

Then I try to SSH to the box and the connection times out.

Then I ping the box and it doesn't reply. Then I goto the console atached to the system and it won't come up.

No msgs display on the front display and all the lights look normal.
So I push the orange reset button on the front once and the front display will start to display the codes, then after about 2.5 min my display will come on with the AIX splash screen and the little terminal window will start that shows the OS booting up.

Then the system is right back where it was. All the apps running, all databases running. It has done this twice in 8 days.
 
Does the /var/adm/messages file give you any clue as to what is happening before the box dies?

As far as your second bullet, you could write a script to output a sorted ps aux list every 5 seconds or so. This might give you an idea of what process is over-growing.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top