Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

crash on rs6000 7043-140 5

Status
Not open for further replies.

antonioelderprado

IS-IT--Management
May 19, 2003
75
0
0
Hi,

I have an old 7043-140 with aix 4.3.3. This machine keeps on crashing, it stays up for 30 minutes max and then just hangs. I did the crash command and the only thing that I could see kind of strange was under crash > stack there is an error Frame pointer not valid 0452-765. Also, on the bootup there is a message saying something about not being able to fork(), but the machine goes up stays on for approx 30 min, and then hangs again. Any suggestion would be great,
ps. I cannot upgrade this machine, it needs to be on 4.3.3

thank you

./antonio/.

./antonio elder prado/.
.\mountain view, ca\.
./bauru, sao paulo/.
 
Is the system crashing or hanging? If it is hard hanging what are you doing to reset it?

Also how are your filesystems? Do they have enough space available or is one filling up?

CA
 
hi cndcadams,

the machine hangs, no activity at all. I can press any key and no response. Also, it freezes the console.
The only way to come up is to hard reset the machine.

the f.s. are all OK, bellow 80%.

strange...

thank you

./antonio/.

./antonio elder prado/.
.\mountain view, ca\.
./bauru, sao paulo/.
 
Sounds like a memory leak to me. There are many complaints about 4.3.3 with many different applications causeing memory leaks.
Run an oslevel -r to check your maintenance level.

Thanks

CA
 
hi

these are the informations:

oslevel -r => does not produce anything,
oslevel -> gives 4.3.3.0
bootinfo -r 786432

interesting enough, this machine was working fine until last week. ...

thank you for the help

./antonio/.

./antonio elder prado/.
.\mountain view, ca\.
./bauru, sao paulo/.
 
786 Mb of ram is extremely low for booting a machine...
If you can , try adding paging space ( compensate your memory shortage), and check if the problem persists.

regards,

R.
 
From IBM site:
512 Mb Minimum 1 Gb Recommended
Swap = 2 x RAM

for your ML try instfix -i|grep ML

sound like a program chewing up your paging space. try single user mode see if that fixes the problem.
 
My $.02:

I've been running an old C10 and a couple of 40P's all with 128MB of RAM and they all perform just fine. These are both for development AND production.

We're not running X though and that may be the difference.

B
 
We also have a workstation , runs just fine with 256 Mb , but when you try to boot a P520 with just 1 gig of memory, it won't boot due to a lack of memory.

regards,

R.
 
Hi guys,

sorry for the late response, I will check all suggestions next Tuesday when we come back from the holiday and will let you know (for now, justr beer and bbq).
I was talking with some people at the company, and they did some work on the compiler (we have the visualage installed), might be this what is trigging this memory problem.
Hope you are all well,
thank you for your help and if you come up with more clues to this, please send it over.

./antonio/.

./antonio elder prado/.
.\mountain view, ca\.
./bauru, sao paulo/.
 
Hi,

First, THANK YOU ALL FOR THE INFORMATION.

What did fix was the upgrade of maintenance level (suggested by cndcadams), previously when I did oslevel -r, nothing came up, just the prompt, now after installing the files from cndcadms link, the output is 4330-04. The machine has been up for 1:40min (this a record).
Seems that with the install of all those *.bff files fix the problem, yes, the machine still slow (this is expected considering the amount of memory in it).
Once again, thank you all for your help,

====
From today (Tuesday 07/05) after the maintenance level upgrade:
===
# oslevel -r
4330-04
# vmstat 2 5
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
2 1 19409 847 0 0 0 47 146 0 159 3133 200 13 21 47 18
0 0 19409 846 0 0 0 0 0 0 113 217 35 0 2 98 0
0 0 19409 846 0 0 0 0 0 0 115 110 37 0 0 99 0
0 0 19409 846 0 0 0 0 0 0 116 107 35 0 1 99 0
0 0 19409 846 0 0 0 0 0 0 115 92 35 0 0 99 0
#
===
from Friday: 07/01 with no upgrade on maintenance
===
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 16159 157638 0 0 0 0 0 0 131 948 96 3 3 92 3
0 0 16159 157637 0 0 0 0 0 0 116 308 54 3 0 96 0
0 0 16159 157637 0 0 0 0 0 0 118 381 51 1 0 98 0
0 0 16159 157637 0 0 0 0 0 0 115 384 51 1 1 98 0
0 0 16159 157637 0 0 0 0 0 0 117 386 50 0 1 99 0
==== Fri Jul 1 11:26:23 CDT 2005 =====
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 16159 157638 0 0 0 0 0 0 131 946 95 3 3 92 3
0 0 16159 157637 0 0 0 0 0 0 119 522 52 2 1 97 0
0 0 16159 157637 0 0 0 0 0 0 118 412 54 0 0 99 0
0 0 16178 157618 0 0 0 0 0 0 117 471 55 1 1 97 0
0 0 16159 157637 0 0 0 0 0 0 117 387 49 0 0 99 0
==== Fri Jul 1 11:26:41 CDT 2005 =====
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 16159 157638 0 0 0 0 0 0 130 943 95 3 3 92 3
0 0 16159 157636 0 0 0 0 0 0 116 363 51 2 0 98 0
0 0 16159 157636 0 0 0 0 0 0 115 387 54 0 0 99 0
0 0 16159 157636 0 0 0 0 0 0 118 357 59 2 1 96 0
0 0 16159 157636 0 0 0 0 0 0 118 297 67 4 0 96 0
==== Fri Jul 1 11:26:59 CDT 2005 =====
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 16159 157637 0 0 0 0 0 0 130 941 95 3 3 92 3
0 0 16159 157636 0 0 0 0 0 0 119 446 52 2 1 97 0
0 0 16159 157636 0 0 0 0 0 0 118 424 53 0 0 99 0
0 0 16159 157636 0 0 0 0 0 0 117 453 52 0 0 99 0
0 0 16159 157636 0 0 0 0 0 0 116 388 49 0 0 99 0
==== Fri Jul 1 11:27:17 CDT 2005 =====
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 16159 157637 0 0 0 0 0 0 130 938 94 2 3 92 3
0 0 16159 157636 0 0 0 0 0 0 117 673 54 2 1 97 0
#
===

I do not see difference on the vmstat.

===

I will keep you posted in case of any new event.

===
just while I was typing this message...



Monday: 11:31AM - CST.

It didn't take long for a new event:
THE MACHINE JUST CRASHED. At least it is taking 1H:50 min for the crashing (before 20 min)...





./antonio/.


./antonio elder prado/.
.\mountain view, ca\.
./bauru, sao paulo/.
 
on the last reboot (the machine stays up for 5 min), this is the output of topas when hang.

Sys Idle 99.5

on the processes:
topas is the most with 0.5% and PgSp 0.5mb

Events/queue:
cswith 44
syscal 334
reads 5

file/tty
readch 1710
writech 82
ttyout 82
namei 6

memory
real,mb 767
%comp 13.7
%noncomp 6.2
client 0.5

paging space
size,mb 2048
%used 0.5
% free 99.4


the mistery continues,


thank you

./antonio/.

./antonio elder prado/.
.\mountain view, ca\.
./bauru, sao paulo/.
 
Hi,

Did you chack the CPU Fan ?
I had problem with few CPU fan on 43P machines.
When the FAN is dead, the machine crash very often :)
 
Hi Mart1,

I didn't think about that, I will open the machine and check.

thank you

./antonio/.

./antonio elder prado/.
.\mountain view, ca\.
./bauru, sao paulo/.
 
Generally a hard hang is due to software causeing memory leak.

I have in the past replaced memory which was causeing the hard hang.

If possible I would try to limit the memory and cpu's depending on how much this system can hold. If you have another system identical to this I would take the memory and cpu from that one and put the hardware from this one into the other one and see how they run.

CA

 
Hi,

[PROBLEM SOLVED]

It seems that the problem was the CPU FAN. I did replace the cpu fan from another machine, and so far this machine has been up for 21Hours.

I would like to express my gratitude for all the help that I got.

thank you : cndcadams, mart1, rmgbelgium, plamb and bobmfdc.

./antonio/.

./antonio elder prado/.
.\mountain view, ca\.
./bauru, sao paulo/.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top