Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

System Crash on F40 Dual Processor AIX 5100-3

Status
Not open for further replies.

bladesman

Technical User
Aug 23, 2002
22
0
0
GB
Hi
Has any one had problems with intermittant system crashes on an F40 with 2 processors running AIX 5.1 maint level 3.
Memory is as follows
2 x 64MB
4 x 128MB

As far as i am aware all firmware is upto date.

Maintenance company have swapped out memory, system planer and processors.

Have just started to rebuild from scratch and system has crashed again.

any idea's any one

Thanks.

Bladesman
Certified Specialist pSeries AIX System Support.
 
Hi,Try this and post the results:
---------------------------------
1.1 Analyzing system dump
If the customer complains that his system had frozen with 888 on the display, check errpt for the entry like this:
C0AA5338 0614145601 U S SYSDUMP SYSTEM DUMP

This means that the system dump have occurred on 14 of June at 14:56.

Run the following command to verify the status of the last system dump:

# sysdumpdev -L

0453-039

Device name: /dev/hd6
Major device number: 10
Minor device number: 2
Size: 63952384 bytes
Date/Time: Thu Jun 14 14:43:11 CST 2001
Dump status: 0
dump completed successfully
Dump copy filename: /var/adm/ras/vmcore.0

Run the crash command on AIX 4.3.3/4.2.1 or kdb command on AIX5 in order to get a basic idea on the possible reasons of the system dump.
The crash subcommands (trace -k, thread -r, status 0) are used to provide a hint on the problem origin:

#cd /var/adm/ras
#crash vmcore.0

Using /unix as the default namelist file.

> trace -k
STACK TRACE:
0x2ff3b400 (excpt=edffff54:40000000:00001004:edffff54:00000106) (intpri=0)
IAR: .remove_e_list+38 (00032888): tweqi r7,0x0
LR: .e_block_thread+40c (00034424)
2ff3b010: .e_sleep_thread+4c (0003497c)
2ff3b060: .[nspdd]+4144 (016ba4e4)
2ff3b100: .[nspdd]+2de4 (016b9184)
2ff3b170: .[nspdd]+7e8 (016b6b88)
2ff3b1f0: .rdevioctl+140 (001b4344)
2ff3b260: .vnop_ioctl+1c (001c01d4)
2ff3b2a0: .vno_ioctl+144 (001d81d8)
2ff3b360: .common_ioctl+b0 (001e7894)
2ff3b3c0: .sys_call_ret+0 (00003a90)
IAR not in kernel segment.

> status 0

CPU TID TSLOT PID PSLOT STOPPED PROC_NAME
0 700f 112 6db0 109 yes pltDc

> thread -r
SLT ST TID PID CPUID POLICY PRI CPU EVENT PROCNAME FLAGS
2 r 205 204 0 FIFO 7f 78 wait
t_flags: sig_avail funnel kthread
3 r 307 306 1 FIFO 7f 78 wait
t_flags: sig_avail funnel kthread
4 r 409 408 2 FIFO 7f 78 wait
t_flags: sig_avail funnel kthread
5 r 50b 50a 3 FIFO 7f 78 wait
t_flags: sig_avail funnel kthread
112 r 700f 6db0 0 RR 40 0 pltDc
t_flags: local cdefer funnel

> proc -r

SLT ST PID PPID PGRP UID EUID TCNT NAME
2 a 204 0 0 0 0 1 wait
FLAGS: swapped_in no_swap fixed_pri kproc
3 a 306 0 0 0 0 1 wait
FLAGS: swapped_in no_swap fixed_pri kproc
55 a 37b8 2282 2282 200 200 1 X
FLAGS: swapped_in execed
112 a 7054 571a 25c8 200 200 1 expose
FLAGS: swapped_in no_swap fixed_pri ppnocldstop execed
122 a 7a14 1 744c 200 200 1 plateExp_dlg35
FLAGS: swapped_in orphanpgrp ppnocldstop execed

>q ;quits the crash command
=================================================================
In this case trace -k shows a problem with nspdd process. thread -r and status 0 both hint on the application process pltDc as responsible for the core dump (it's the last process that run).


"Long live king Moshiach !"
 
Thanks for that info.

Once i get a clean system dump i give this a go.

I'll let you know how i get on.

Thanks

Bladesman
Certified Specialist pSeries AIX System Support.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top