Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Hi My server just panic and reboot

Status
Not open for further replies.

babeo

Technical User
Mar 30, 2000
398
CA
Hi
My server just panic and reboots itself, and I could not tell what is the exact problem. Could someone tell me what's wrong with my box? Does it involve with some processes? or it is hardware problem? (toro1 is our server name). Also, is there any book or doc from SUN site explains about the error message code/abbriviation mean?

I extract some lines need attention.

May 3 08:32:12 toro1 SUNW,UltraSPARC-II: [ID 132142 kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU2 Data access at TL>0, errID 0x0009d391.133a2faf

May 3 08:32:13 toro1 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x10): 0x0000000a.00000000 *Bad* PSYND=0xff00

May 3 08:32:13 toro1 unix: [ID 674256 kern.notice] [AFT1] errID 0x0009d391.133a2faf UE Error(s)



May 3 08:32:13 toro1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x00000000.00000000
May 3 08:32:13 toro1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x00000000.00000000
May 3 08:32:13 toro1 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x10): 0x0000000a.00000000 *Bad* PSYND=0xff00
May 3 08:32:13 toro1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x00000000.0000cafe
May 3 08:32:13 toro1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x00000000.1046aea0
May 3 08:32:13 toro1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x28): 0x00000000.00000000
May 3 08:32:13 toro1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x00000000.00000000
May 3 08:32:13 toro1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0x00000300.0a7630c0
May 3 08:32:13 toro1 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000001 000002a10014b39
May 3 08:32:13 toro1 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000001 000002a10014b390 0000000000000003 0000000000000010
May 3 08:32:13 toro1 %l4-7: 0000000080000000 0000000000005508 0000000000005550 0000030000300800
May 3 08:32:13 toro1 genunix: [ID 723222 kern.notice] 000002a10014b2c0 SUNW,UltraSPARC-II:cpu_async_error+7ec (104590e8, 26007c90, 80300000, 1040daec, 0, 3af)
May 3 08:32:13 toro1 genunix: [ID 179002 kern.notice] %l0-3: 00000000000003af 0000000000400000 0000000000000000 0000000080300000
May 3 08:32:13 toro1 %l4-7: 000002a10014b390 0000000026007c80 0000000000400000 0000000000000001
May 3 08:32:13 toro1 genunix: [ID 723222 kern.notice] 000002a10014b4a0 unix:prom_rtt+0 (0, 2a10014bd40, 20, 3000194b518, 10400000, 3000005df90)
May 3 08:32:13 toro1 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000005 0000000000001400 0000004400001604 000000001013871c
May 3 08:32:13 toro1 %l4-7: 0000000000000000 0000000000000000 0000000000000006 000002a10014b550
May 3 08:32:13 toro1 genunix: [ID 723222 kern.notice] 000002a10014b5f0 unix:putnext+104 (30002ade360, 30003cef368, 20, 3000a941500, 0, 0)
May 3 08:32:13 toro1 genunix: [ID 179002 kern.notice] %l0-3: 000003000a763a40 000003000a763c00 0000030002ade360 0000030003c757e0
May 3 08:32:13 toro1 %l4-7: 0000000010495ba8 0000000000000000 0000000000000000 0000000000000000
May 3 08:32:13 toro1 genunix: [ID 723222 kern.notice] 000002a10014b6a0 ip:ip_rput_local+994 (0, 30002ac2c50, 30002ade360, 0, 3000a941500, 30002313cf8)
May 3 08:32:14 toro1 genunix: [ID 179002 kern.notice] %l0-3: 0000030002ac2be8 00000300022d4b50 000003000220ba38 0000030001926428
May 3 08:32:14 toro1 %l4-7: 000003000a941500 000000000afe0202 0000000000000000 000000000000ffff
May 3 08:32:14 toro1 genunix: [ID 723222 kern.notice] 000002a10014b790 ip:ip_rput+224 (30002313cf8, 30001926428, 3000220ba38, 30002313cf8, 300022d4b50, 3000a941500)
May 3 08:32:14 toro1 genunix: [ID 179002 kern.notice] %l0-3: 00000000000614a5 000003000220ba38 0000000000000000 0000030000b95e08
May 3 08:32:14 toro1 %l4-7: 0000030009132b40 0000000000000001 0000000000000001 9000000000000012
May 3 08:32:14 toro1 genunix: [ID 723222 kern.notice] 000002a10014b860 unix:putnext+1cc (300019May 3 08:32:14 toro1 genunix: [ID 723222 kern.notice] 000002a10014b860 unix:putnext+1cc (300019adf80, 30001936b40, 3000220ba38, 3000a941500, 300019adf88, 300019adf80)
May 3 08:32:14 toro1 genunix: [ID 179002 kern.notice] %l0-3: 000003000220ba38 0000030001935ea8 0000030001992bf0 0000000000000000
May 3 08:32:14 toro1 %l4-7: 000000001018ce24 0000000000000000 0000000000000000 0000030000300940
May 3 08:32:14 toro1 genunix: [ID 723222 kern.notice] 000002a10014b910 qfe:qferead_dvma+344 (0, 30001992bf0, 3000030b530, 5400, 80, 30000306000)
May 3 08:32:14 toro1 genunix: [ID 179002 kern.notice] %l0-3: 000003000030b508 0000000000000000 000003000a941500 0000030000306ca0
May 3 08:32:14 toro1 %l4-7: 0000000000000476 0000030000300c00 0000030003c757e0 000002a100951ba0
May 3 08:32:14 toro1 genunix: [ID 723222 kern.notice] 000002a10014b9e0 qfe:qfe_intr+13c (5400,fffefc00, 54c8, 30000306428, 0, 30000306000)
May 3 08:32:14 toro1 genunix: [ID 179002 kern.notice] %l0-3: 000000007803c1d4 0000000000010000 0000000000000000 0000000000000000
May 3 08:32:14 toro1 %l4-7: 0000000080000000 0000000000005508 0000000000005550 0000030000300800
May 3 08:32:14 toro1 genunix: [ID 723222 kern.notice] 000002a10014ba90 pcipsy:pci_intr_wrapper+60 (1047cac0, 7d9, 30000070f08, 300021c8528, 0, 0)
May 3 08:32:15 toro1 genunix: [ID 179002 kern.notice] %l0-3: 0000000078039a54 00000300000801e8 0000000000000000 0000000000000000
May 3 08:32:15 toro1 %l4-7: 0000000000000000 0000000000000000 0000000000000000 0000000000000000

Thanks
 
Sounds like it's time to call Sun! 1-800-872-4786 and be prepared to give them the serial number off the machine or your service contract number.
 
Check with: /usr/platform/sun4u/sbin/prtdiag

If there a failure in the system?

Is there no failure then:

If you have more as one group of RAM's insert, then
replace the RAM DIMM's on the group 0 with the group 1:

U0601 Group 0 Bank 1 00000000 - 1fffffff 2nd Dbl 16-31
U0701 Group 0 Bank 1 00000000 - 1fffffff 2nd Dbl 00-15
U0401 Group 0 Bank 0 00000000 - 1fffffff 1st Dbl 16-31
U0501 Group 0 Bank 0 00000000 - 1fffffff 1st Dbl 00-15


regards p




 
Check with: /usr/platform/sun4u/sbin/prtdiag

If there a failure in the system?

Is there no failure then:

If you have more as one group of RAM's insert, then
replace the RAM DIMM's on the group 0 with the group 1:

U0601 Group 0 Bank 1 00000000 - 1fffffff 2nd Dbl 16-31
U0701 Group 0 Bank 1 00000000 - 1fffffff 2nd Dbl 00-15
U0401 Group 0 Bank 0 00000000 - 1fffffff 1st Dbl 16-31
U0501 Group 0 Bank 0 00000000 - 1fffffff 1st Dbl 00-15


regards p




 
Thanks Apricot

When I check my system, there is no failure found, and here is the list, could you tell me which one represents the memory? Thanks again

System Configuration: Sun Microsystems sun4u Sun Enterprise 420R (2 X UltraSPARC-II 450MHz)
System clock frequency: 113 MHz
Memory size: 2048 Megabytes

========================= CPUs =========================

Run Ecache CPU CPU
Brd CPU Module MHz MB Impl. Mask
--- --- ------- ----- ------ ------ ----
0 1 1 450 4.0 US-II 10.0
0 2 2 450 4.0 US-II 10.0


========================= IO Cards =========================

Bus Freq
Brd Type MHz Slot Name Model
--- ---- ---- ---- -------------------------------- ----------------------
0 PCI 33 0 SUNW,qfe-pci108e,1001 SUNW,pci-qfe
0 PCI 33 1 network-SUNW,hme
0 PCI 33 1 SUNW,qfe-pci108e,1001 SUNW,pci-qfe
0 PCI 33 2 SUNW,qfe-pci108e,1001 SUNW,pci-qfe
0 PCI 33 3 scsi-glm/disk (block) Symbios,53C875
0 PCI 33 3 scsi-glm/disk (block) Symbios,53C875
0 PCI 33 3 SUNW,qfe-pci108e,1001 SUNW,pci-qfe
0 PCI 33 5 TSI,gfxp GFXP

No failures found in System
===========================


 
Make a stress test (only RAM) with sunvts:

# /opt/SUNwvts/bin/sunvts &

Is there a failure in the system, now?

Is there no failure then:

Replace the RAM DIMM's on the group 0 with the group 1
on the Memory riser board and boot in the extended (diagnostic) mode (with a terminal on serial port) only with this Ram Group:

U0301 U1301 0 0000 0000 - 3fff ffff
U0302 U1302 0 0000 0000 - 3fff ffff

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top