Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Sun fire v440 keeps on rebooting very often

Status
Not open for further replies.

sunny1504

Technical User
Nov 30, 2005
77
0
0
US
Hi All,

There seems to be some problem with sunfirev440 with Solaris 5.9 OS it keeps on rebooting very often.

Below is the copy of /var/adm/messages.

My system rebooted at 04:56
************************************************
Jan 27 04:53:12 sundr unix: [ID 836849 kern.notice]
Jan 27 04:53:12 sundr ^Mpanic[cpu2]/thread=2a100003d40:
Jan 27 04:53:12 sundr unix: [ID 237512 kern.notice] bad kernel MMU miss at TL 2
Jan 27 04:53:12 sundr unix: [ID 100000 kern.notice]
Jan 27 04:53:12 sundr unix: [ID 293396 kern.notice] %tl %tpc %tnpc
%tstate %tt
Jan 27 04:53:12 sundr unix: [ID 785305 kern.notice] 1 0000000001180a58 000000
0001180a5c 4480001604 068
Jan 27 04:53:12 sundr unix: [ID 898499 kern.notice] %ccr: 44 %asi: 80 %cwp
: 4 %pstate: 16<PEF,PRIV,IE>
Jan 27 04:53:12 sundr unix: [ID 785305 kern.notice] 2 00000000010072b0 000000
00010072b4 9180001505 068
Jan 27 04:53:12 sundr unix: [ID 898499 kern.notice] %ccr: 91 %asi: 80 %cwp
: 5 %pstate: 15<PEF,PRIV,AG>
Jan 27 04:53:12 sundr unix: [ID 216294 kern.notice] %g0-3: 0000000000000000 0000
000000000002 0000003226b0a920 0000003226b0a920
Jan 27 04:53:13 sundr unix: [ID 600269 kern.notice] %g4-7: 0000000000000000 0000
000000000002 000000000140a580 0000000000000068
Jan 27 04:53:13 sundr unix: [ID 531632 kern.notice] Register window 5, caller cp
u_flt_in_memory+1c
Jan 27 04:53:13 sundr unix: [ID 454034 kern.notice] %o0-3: 0000000001491c00 0000
02a100000000 0000000000000031 0000000000000000
Jan 27 04:53:13 sundr %o4-7: 00000000000002a1 0000000000000000 000002a1000013a1
000000000117967c
Jan 27 04:53:13 sundr unix: [ID 960796 kern.notice] %l0-3: 0000000001180a58 0000
000001180a5c 0000004480001604 000000000102d9a8
Jan 27 04:53:13 sundr %l4-7: 000000001000000e 00000300004d17a8 000000000000000e
000002a100001c50
Jan 27 04:53:13 sundr unix: [ID 567577 kern.notice] %i0-3: 0000000000000000 0000
000001491c60 0000000000000007 0000000000000000
Jan 27 04:53:13 sundr %i4-7: 0000000010000000 0000000000000000 000002a1000014f1
0000000001178688
Jan 27 04:53:13 sundr unix: [ID 531632 kern.notice] Register window 4, caller er
rorq_dispatch+68
Jan 27 04:53:13 sundr unix: [ID 960796 kern.notice] %l0-3: 0000000001491b70 0000
000010000000 0000000000000001 000000200cf230a0
Jan 27 04:53:13 sundr %l4-7: 000000001000000e 0000000000000000 0000000000000000
000002a100003088
Jan 27 04:53:13 sundr unix: [ID 567577 kern.notice] %i0-3: 000002a100002238 0000
030001f30000 00000000000008a8 000002a1000020fd
Jan 27 04:53:13 sundr %i4-7: 000002a1000020f9 0000000000ff0000 000002a1000016b1
00000000010965b0
Jan 27 04:53:13 sundr unix: [ID 531632 kern.notice] Register window 3, caller cp
u_queue_events+b8
Jan 27 04:53:13 sundr unix: [ID 960796 kern.notice] %l0-3: 0000000000000000 0000
0300002c1488 0000000000000000 00000300004d1690
Jan 27 04:53:13 sundr %l4-7: 00000300026e6090 00000300004d17a8 0000000000000000
0000000000000109
Jan 27 04:53:13 sundr unix: [ID 567577 kern.notice] %i0-3: 00000300004d1690 0000
02a100002238 00000000000008a8 0000000000000000
Jan 27 04:53:13 sundr %i4-7: 0000000000000000 000003000007bf50 000002a100001761
00000000011792a4
Jan 27 04:53:13 sundr unix: [ID 531632 kern.notice] Register window 2, caller cp
u_log_and_clear_ce+140
Jan 27 04:53:13 sundr unix: [ID 960796 kern.notice] %l0-3: 0000000001491b70 0000
000018000000 0000000000000001 000000200cf230a0
Jan 27 04:53:13 sundr %l4-7: 0000000018000003 0000000000000000 0000000018000000
000002a100002238
Jan 27 04:53:13 sundr unix: [ID 567577 kern.notice] %i0-3: 000002a1000020f8 0000
0000014917d0 0000000001491400 00000300026e2a40
Jan 27 04:53:13 sundr %i4-7: 00000000e8d9c800 0000000000000000 000002a100001811
000000000117b874
Jan 27 04:53:13 sundr unix: [ID 531632 kern.notice] Register window 1, caller cp
u_disrupting_error+120
Jan 27 04:53:13 sundr unix: [ID 960796 kern.notice] %l0-3: 00000300026e2a40 0000
000018000003 0000000000000000 0000000018000000
Jan 27 04:53:13 sundr %l4-7: 00000300026e6090 0000000000000001 00000300026e2000
0000000001432204
Jan 27 04:53:13 sundr unix: [ID 567577 kern.notice] %i0-3: 000002a100002238 ffff
ffffffffffff 0000000000000054 0000000000000000
Jan 27 04:53:13 sundr %i4-7: 000000000100c7ec 0000000000000000 000002a100001921
0000000001174b3c
Jan 27 04:53:13 sundr unix: [ID 531632 kern.notice] Register window 0, caller kt
l0+48
Jan 27 04:53:13 sundr unix: [ID 960796 kern.notice] %l0-3: 0000000000000002 0000
000000000000 0000000080000000 000002a100d91ac8
Jan 27 04:53:13 sundr %l4-7: 0000000001497000 0000000001000000 00000300044b5518
0000000000002200
Jan 27 04:53:13 sundr unix: [ID 567577 kern.notice] %i0-3: 000002a100002bb0 0000
000000000000 0000000000000000 0000000000000000
Jan 27 04:53:13 sundr %i4-7: 0000000000000000 000000003b9aca00 000002a100002301
0000000001007640
Jan 27 04:53:13 sundr unix: [ID 531632 kern.notice] Register window 7, caller cp
u_flt_in_memory+1c
Jan 27 04:53:13 sundr unix: [ID 960796 kern.notice] %l0-3: 0000000000000007 0000
000000001400 0000000080001606 0000000001174a1c
Jan 27 04:53:13 sundr %l4-7: 0000000001438400 0000000000002200 000000000000000e
000002a100002bb0
Jan 27 04:53:13 sundr unix: [ID 567577 kern.notice] %i0-3: 0000000000000000 0000
000001491c60 0000000000000000 0000000000000000
Jan 27 04:53:13 sundr %i4-7: 0000000010000000 0000000000000000 000002a100002451
0000000001178688
Jan 27 04:53:13 sundr unix: [ID 100000 kern.notice]
Jan 27 04:53:13 sundr last message repeated 1 time
Jan 27 04:53:13 sundr genunix: [ID 672855 kern.notice] syncing file systems...
Jan 27 04:53:13 sundr unix: [ID 836849 kern.notice]
Jan 27 04:53:13 sundr ^Mpanic[cpu2]/thread=2a100003d40:
Jan 27 04:53:13 sundr unix: [ID 715357 kern.notice] panic sync timeout
Jan 27 04:53:13 sundr unix: [ID 100000 kern.notice]
Jan 27 04:53:13 sundr genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c1t0d
0s1, offset 65536, content: kernel
Jan 27 04:53:13 sundr genunix: [ID 409368 kern.notice] ^M100% done: 66034 pages
dumped, compression ratio 3.08,
Jan 27 04:53:13 sundr genunix: [ID 851671 kern.notice] dump succeeded
0s1, offset 65536, content: kernel
Jan 27 04:53:13 sundr genunix: [ID 409368 kern.notice] ^M100% done: 66034 pages
dumped, compression ratio 3.08,
Jan 27 04:53:13 sundr genunix: [ID 851671 kern.notice] dump succeeded
Jan 27 04:55:43 sundr genunix: [ID 540533 kern.notice] ^MSunOS Release 5.9 Versi
on Generic_118558-19 64-bit
Jan 27 04:55:43 sundr genunix: [ID 943905 kern.notice] Copyright 1983-2003 Sun M
icrosystems, Inc. All rights reserved.
Jan 27 04:55:43 sundr Use is subject to license terms.
Jan 27 04:55:43 sundr genunix: [ID 678236 kern.info] Ethernet address = 0:3:ba:6
4:ea:5f
Jan 27 04:55:43 sundr unix: [ID 389951 kern.info] mem = 8388608K (0x200000000)
Jan 27 04:55:43 sundr unix: [ID 930857 kern.info] avail mem = 8225464320
Jan 27 04:55:43 sundr rootnex: [ID 466748 kern.info] root nexus = Sun Fire V440
Jan 27 04:55:43 sundr rootnex: [ID 349649 kern.info] pcisch2 at root: SAFARI 0x1
e 0x600000
Jan 27 04:55:43 sundr genunix: [ID 936769 kern.info] pcisch2 is /pci@1e,600000
Jan 27 04:55:43 sundr rootnex: [ID 349649 kern.info] pcisch0 at root: SAFARI 0x1
c 0x600000
Jan 27 04:55:43 sundr genunix: [ID 936769 kern.info] pcisch0 is /pci@1c,600000
Jan 27 04:55:43 sundr rootnex: [ID 349649 kern.info] pcisch1 at root: SAFARI 0x1
d 0x700000
Jan 27 04:55:43 sundr genunix: [ID 936769 kern.info] pcisch1 is /pci@1d,700000
Jan 27 04:55:43 sundr rootnex: [ID 349649 kern.info] pcisch3 at root: SAFARI 0x1
f 0x700000
Jan 27 04:55:43 sundr genunix: [ID 936769 kern.info] pcisch3 is /pci@1f,700000
Jan 27 04:55:43 sundr pcisch: [ID 370704 kern.info] PCI-device: ide@d, uata0
Jan 27 04:55:43 sundr genunix: [ID 936769 kern.info] uata0 is /pci@1e,600000/ide
@d
Jan 27 04:55:43 sundr scsi: [ID 365881 kern.info] /pci@1f,700000/scsi@2 (mpt0):
Jan 27 04:55:43 sundr Rev. 7 LSI, Inc. 1030 found.
Jan 27 04:55:43 sundr scsi: [ID 365881 kern.info] /pci@1f,700000/scsi@2 (mpt0):
Jan 27 04:55:43 sundr mpt0 supports power management.
Jan 27 04:55:52 sundr scsi: [ID 365881 kern.info] /pci@1f,700000/scsi@2 (mpt0):
Jan 27 04:55:52 sundr mpt0 Firmware version v1.3.27.0
Jan 27 04:55:52 sundr scsi: [ID 365881 kern.info] /pci@1f,700000/scsi@2 (mpt0):
Jan 27 04:55:52 sundr mpt0: IOC Operational.
Jan 27 04:56:13 sundr pcisch: [ID 370704 kern.info] PCI-device: scsi@2, mpt0
Jan 27 04:56:13 sundr genunix: [ID 936769 kern.info] mpt0 is /pci@1f,700000/scsi
@2
Jan 27 04:56:13 sundr scsi: [ID 365881 kern.info] /pci@1f,700000/scsi@2,1 (mpt1)
:
Jan 27 04:56:13 sundr Rev. 7 LSI, Inc. 1030 found.
Jan 27 04:56:13 sundr scsi: [ID 365881 kern.info] /pci@1f,700000/scsi@2,1 (mpt1)
:
Jan 27 04:56:13 sundr mpt1 supports power management.
Jan 27 04:56:13 sundr scsi: [ID 365881 kern.info] /pci@1f,700000/scsi@2,1 (mpt1)
:
Jan 27 04:56:13 sundr mpt1 Firmware version v1.3.27.0
Jan 27 04:56:13 sundr scsi: [ID 365881 kern.info] /pci@1f,700000/scsi@2,1 (mpt1)
:
Jan 27 04:56:13 sundr mpt1: IOC Operational.
Jan 27 04:56:16 sundr pcisch: [ID 370704 kern.info] PCI-device: scsi@2,1, mpt1
Jan 27 04:56:16 sundr genunix: [ID 936769 kern.info] mpt1 is /pci@1f,700000/scsi
@2,1
Jan 27 04:56:16 sundr scsi: [ID 193665 kern.info] sd0 at uata0: target 0 lun 0
Jan 27 04:56:16 sundr genunix: [ID 936769 kern.info] sd0 is /pci@1e,600000/ide@d
/sd@0,0
Jan 27 04:56:16 sundr scsi: [ID 193665 kern.info] sd1 at mpt0: target 0 lun 0
Jan 27 04:56:16 sundr genunix: [ID 936769 kern.info] sd1 is /pci@1f,700000/scsi@
2/sd@0,0
Jan 27 04:56:16 sundr scsi: [ID 193665 kern.info] sd2 at mpt0: target 1 lun 0
Jan 27 04:56:16 sundr genunix: [ID 936769 kern.info] sd2 is /pci@1f,700000/scsi@
2/sd@1,0
Jan 27 04:56:16 sundr scsi: [ID 193665 kern.info] sd3 at mpt0: target 2 lun 0
Jan 27 04:56:16 sundr genunix: [ID 936769 kern.info] sd3 is /pci@1f,700000/scsi@
2/sd@2,0
Jan 27 04:56:17 sundr scsi: [ID 193665 kern.info] sd4 at mpt0: target 3 lun 0
Jan 27 04:56:17 sundr genunix: [ID 936769 kern.info] sd4 is /pci@1f,700000/scsi@
2/sd@3,0
Jan 27 04:56:21 sundr swapgeneric: [ID 308332 kern.info] root on /pseudo/md@0:0,
0,blk fstype ufs
Jan 27 04:56:21 sundr pcisch: [ID 370704 kern.info] PCI-device: isa@7, ebus0
Jan 27 04:56:21 sundr genunix: [ID 936769 kern.info] ebus0 is /pci@1e,600000/isa
@7
Jan 27 04:56:21 sundr rootnex: [ID 349649 kern.info] mc-us3i0 at root: SAFARI 0x
0 0x0 ...
Jan 27 04:56:21 sundr genunix: [ID 936769 kern.info] mc-us3i0 is /memory-control
ler@0,0
Jan 27 04:56:21 sundr rootnex: [ID 349649 kern.info] mc-us3i1 at root: SAFARI 0x
1 0x0 ...
Jan 27 04:56:21 sundr genunix: [ID 936769 kern.info] mc-us3i1 is /memory-control
ler@1,0
Jan 27 04:56:21 sundr rootnex: [ID 349649 kern.info] mc-us3i2 at root: SAFARI 0x
2 0x0 ...
Jan 27 04:56:21 sundr genunix: [ID 936769 kern.info] mc-us3i2 is /memory-control
ler@2,0
Jan 27 04:56:21 sundr rootnex: [ID 349649 kern.info] mc-us3i3 at root: SAFARI 0x
3 0x0 ...
Jan 27 04:56:21 sundr genunix: [ID 936769 kern.info] mc-us3i3 is /memory-control
ler@3,0
Jan 27 04:56:21 sundr ebus: [ID 521012 kern.info] power0 at ebus0: offset 0,800
Jan 27 04:56:21 sundr genunix: [ID 936769 kern.info] power0 is /pci@1e,600000/is
a@7/power@0,800
Jan 27 04:56:21 sundr ebus: [ID 521012 kern.info] rmc_comm0 at ebus0: offset 0,3
e8
Jan 27 04:56:21 sundr pcisch: [ID 370704 kern.info] PCI-device: pmu@6, pmubus0
Jan 27 04:56:21 sundr pcisch: [ID 370704 kern.info] PCI-device: gpio@80000000, p
mugpio1
Jan 27 04:56:21 sundr genunix: [ID 936769 kern.info] pmugpio1 is /pci@1e,600000/
pmu@6/gpio@80000000
Jan 27 04:56:21 sundr pseudo: [ID 129642 kern.info] pseudo-device: rmclomv0
Jan 27 04:56:21 sundr genunix: [ID 936769 kern.info] rmclomv0 is /pseudo/rmclomv
@0
Jan 27 04:56:21 sundr rmclomv: [ID 758372 kern.notice] Hardware watchdog enabled
Jan 27 04:56:21 sundr ebus: [ID 521012 kern.info] su0 at ebus0: offset 0,3f8
Jan 27 04:56:21 sundr genunix: [ID 936769 kern.info] su0 is /pci@1e,600000/isa@7
/serial@0,3f8
Jan 27 04:56:21 sundr ebus: [ID 521012 kern.info] su1 at ebus0: offset 0,2e8
Jan 27 04:56:21 sundr genunix: [ID 936769 kern.info] su1 is /pci@1e,600000/isa@7
/serial@0,2e8
Jan 27 04:56:21 sundr unix: [ID 270833 kern.info] cpu3: UltraSPARC-IIIi (portid
3 impl 0x16 ver 0x24 clock 1062 MHz)
Jan 27 04:56:22 sundr unix: [ID 270833 kern.info] cpu0: UltraSPARC-IIIi (portid
0 impl 0x16 ver 0x24 clock 1062 MHz)
Jan 27 04:56:22 sundr unix: [ID 721127 kern.info] cpu 0 initialization complete
- online
Jan 27 04:56:22 sundr unix: [ID 270833 kern.info] cpu1: UltraSPARC-IIIi (portid
1 impl 0x16 ver 0x24 clock 1062 MHz)
Jan 27 04:56:22 sundr unix: [ID 721127 kern.info] cpu 1 initialization complete
- online
Jan 27 04:56:22 sundr unix: [ID 270833 kern.info] cpu2: UltraSPARC-IIIi (portid
2 impl 0x16 ver 0x24 clock 1062 MHz)
Jan 27 04:56:22 sundr unix: [ID 721127 kern.info] cpu 2 initialization complete
- online
Jan 27 04:56:22 sundr pcisch: [ID 370704 kern.info] PCI-device: usb@a, ohci0
Jan 27 04:56:22 sundr genunix: [ID 936769 kern.info] ohci0 is /pci@1e,600000/usb
@a
Jan 27 04:56:22 sundr pcisch: [ID 370704 kern.info] PCI-device: usb@b, ohci1
Jan 27 04:56:22 sundr genunix: [ID 936769 kern.info] ohci1 is /pci@1e,600000/usb
@b
Jan 27 04:56:25 sundr genunix: [ID 408822 kern.info] NOTICE: ce0: no fault exter
nal to device; service available
Jan 27 04:56:25 sundr genunix: [ID 611667 kern.info] NOTICE: ce0: xcvr addr:0x01
- link up 100 Mbps full duplex
Jan 27 04:56:25 sundr genunix: [ID 454863 kern.info] dump on /dev/md/dsk/d3 size
16386 MB
Jan 27 04:56:41 sundr pseudo: [ID 129642 kern.info] pseudo-device: devinfo0
Jan 27 04:56:41 sundr genunix: [ID 936769 kern.info] devinfo0 is /pseudo/devinfo
@0










Thanks
Sunil D'Souza
 
Looks like CPU2 is going into panic because of a MMU (Memory Management Unit) miss... You can try shutting down CPU2 to see if that stops the reboots...
 
It looks like a problem with cpu2. If the system is under warranty or maintenance contract, then get the hardware supplier/maintainer to sort it out. If not, then I would remove cpu2, put cpu3 in cpu2's slot and reboot. If it stays up then you have found a 'suspect' cpu card and you need to decide whether to carry on with 3 of the 4 cpu cards or buy a replacement.

I hope that helps.

Mike
 
Good idea Mike... I'm curious though; under our support contract for Enterprise class systems we are not allowed to touch the hardware or support for issues like this becomes null and void; is that not always the case? Is the V440 not Enterprise class, or do you just have a different kind of support agreement? Or do you have certification that allows you to touch the hardware?

Annihilannic.
 
Hi Annihilannic,

I was only suggesting 'opening the box' if there was no warranty or maintenance contract in place. There are different maintenance contracts - some with the manufacturer (who will only deal with their equipment) and some with "Third-Party" maintainers (who will service a range of different manufacturers equipment). I agree, some maintenance contracts can be made null and void if the hardware is 'tampered' with and I certainly was not suggesting that anyone should.
No, I do not have "certification that allows you to touch the hardware". I would normally only 'open a box' if it was out of warranty and had no maintenance contract and was broken.

Regards.

Mike

 
Jan 26 16:53:11 sundr SUNW,UltraSPARC-IIIi: [ID 409465 kern.info] [AFT3] errID 0
x00006396.052dcb6c Above Error detected by protected Kernel code
Jan 26 16:53:11 sundr that will try to clear error from system
Jan 27 04:53:12 sundr unix: [ID 836849 kern.notice]
Jan 27 04:53:12 sundr ^Mpanic[cpu2]/thread=2a100003d40:
Jan 27 04:53:12 sundr unix: [ID 237512 kern.notice] bad kernel MMU miss at TL 2
Jan 27 04:53:12 sundr unix: [ID 100000 kern.notice]
Jan 27 04:53:12 sundr unix: [ID 293396 kern.notice] %tl %tpc %tnpc
%tstate %tt
Jan 27 04:53:12 sundr unix: [ID 785305 kern.notice] 1 0000000001180a58 000000
0001180a5c 4480001604 068
Jan 27 04:53:12 sundr unix: [ID 898499 kern.notice] %ccr: 44 %asi: 80 %cwp
: 4 %pstate: 16<PEF,PRIV,IE>
Jan 27 04:53:12 sundr unix: [ID 785305 kern.notice] 2 00000000010072b0 000000
00010072b4 9180001505 068
Jan 27 04:53:12 sundr unix: [ID 898499 kern.notice] %ccr: 91 %asi: 80 %cwp
: 5 %pstate: 15<PEF,PRIV,AG>
Jan 27 04:53:12 sundr unix: [ID 216294 kern.notice] %g0-3: 0000000000000000 0000
000000000002 0000003226b0a920 0000003226b0a920
Jan 27 04:53:13 sundr unix: [ID 600269 kern.notice] %g4-7: 0000000000000000 0000
000000000002 000000000140a580 0000000000000068
Jan 27 04:53:13 sundr unix: [ID 531632 kern.notice] Register window 5, caller cp
u_flt_in_memory+1c





Jan 27 04:59:22 sundr genunix: [ID 454863 kern.info] dump on /dev/dsk/c1t0d0s1 s
ize 16386 MB
Jan 27 04:59:22 sundr savecore: [ID 570001 auth.error] reboot after panic: bad k
ernel MMU miss at TL 2
Jan 27 04:59:22 sundr savecore: [ID 624313 auth.error] not enough space in /var/
crash/sundr (506 MB avail, 519 MB needed)

Jan 27 04:59:22 sundr syslog: [ID 522582 daemon.notice] /usr/sbin/pmconfig: /etc
/power.conf line (18) failed to convert mount point /dev/md/dsk/d0 to prom name
Jan 27 04:59:23 sundr sendmail[232]: [ID 801593 mail.crit] NOQUEUE: SYSERR(root)
: /etc/mail/sendmail.cf: line 83: fileclass: cannot open '/etc/mail/local-host-n
ames': No such file or directory



an 27 09:22:25 sundr pcisch: [ID 284024 kern.warning] WARNING: uncorrectable er
ror detected by pci0 (safari id 00000000.0000001e) during
Jan 27 09:22:25 sundr DVMA read transaction
Jan 27 09:22:25 sundr pcisch: [ID 475334 kern.info] Transaction was a block
operation.
Jan 27 09:22:25 sundr pcisch: [ID 956438 kern.info] dvma access, Memory safa
ri command, address 00000020.0cf23020, owned_in not asserted.
Jan 27 09:22:25 sundr pcisch: [ID 863403 kern.info] AFSR=48000000.9e000000 A
FAR=00000020.0cf23020,
Jan 27 09:22:25 sundr quad word offset 00000000.00000002, Memory Module <C2/P0
/B0: B0/D0 B0/D1> port id 30.
Jan 27 09:22:25 sundr pcisch: [ID 545677 kern.info] mtag 0, mtag ecc syndrom
e 0





Jan 27 09:22:25 sundr pcisch: [ID 308334 kern.info] secondary error from DVM
A read transaction
Jan 27 09:22:25 sundr SUNW,UltraSPARC-IIIi: [ID 601449 kern.warning] WARNING: [A
FT1] Corrected memory (FRC) Event detected by CPU2 at TL=0, errID 0x00000ea3.d88
82d10
Jan 27 09:22:25 sundr AFSR 0x00000000.18001f07<FRC,FRU> AFAR 0x00000020.0cf2
30a0 INVALID
Jan 27 09:22:25 sundr Fault_PC 0x0 Esynd 0x0107 INVALID J_AID f INVALID
Jan 27 09:22:25 sundr SUNW,UltraSPARC-IIIi: [ID 794190 kern.warning] WARNING: [A
FT1] Uncorrectable memory (FRU) Event detected by CPU2 at TL=0, errID 0x00000ea3
.d8882d10
Jan 27 09:22:25 sundr AFSR 0x00000000.18001f07<FRC,FRU> AFAR 0x00000020.0cf2
30a0 INVALID
Jan 27 09:22:25 sundr Fault_PC 0x0 Esynd 0x0107 J_AID f
Jan 27 09:22:25 sundr SUNW,UltraSPARC-IIIi: [ID 895894 kern.notice] [AFT1] errID
0x00000ea3.d8882d10 Two Bits were in error
Jan 27 09:22:25 sundr unix: [ID 836849 kern.notice]
Jan 27 09:22:25 sundr ^Mpanic[cpu1]/thread=2a10000bd40:
Jan 27 09:22:25 sundr unix: [ID 261965 kern.notice] Fatal PCI UE Error
Jan 27 09:22:25 sundr unix: [ID 100000 kern.notice]
Jan 27 09:22:25 sundr last message repeated 1 time
Jan 27 09:22:25 sundr genunix: [ID 672855 kern.notice] syncing file systems...
Jan 27 09:22:55 sundr unix: [ID 836849 kern.notice]
Jan 27 09:22:55 sundr ^Mpanic[cpu1]/thread=2a10000bd40:
Jan 27 09:22:55 sundr unix: [ID 715357 kern.notice] panic sync timeout
Jan 27 09:22:55 sundr unix: [ID 100000 kern.notice]
Jan 27 09:22:55 sundr genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c1t0d
0s1, offset 65536, content: kernel
Jan 27 09:22:58 sundr scsi: [ID 107833 kern.warning] WARNING: /pci@1f,700000/scs
i@2 (mpt0):
Jan 27 09:22:58 sundr unknown ioc_status = 3
Jan 27 09:22:58 sundr scsi: [ID 107833 kern.notice] scsi_state = 0, transfer
count = 0, scsi_status = 0
Jan 27 09:23:01 sundr scsi: [ID 107833 kern.notice] /pci@1f,700000/scsi@2 (mpt0)
:
Jan 27 09:23:01 sundr got external SCSI bus reset.
Jan 27 09:23:01 sundr scsi: [ID 365881 kern.info] /pci@1f,700000/scsi@2 (mpt0):
Jan 27 09:23:01 sundr Log info 11070000 received for target 2.
Jan 27 09:23:01 sundr scsi_status=0, ioc_status=804b, scsi_state=8
Jan 27 09:23:01 sundr scsi: [ID 365881 kern.info] /pci@1f,700000/scsi@2 (mpt0):
Jan 27 09:23:01 sundr Log info 11070000 received for target 2.
Jan 27 09:23:01 sundr scsi_status=0, ioc_status=804b, scsi_state=8
Jan 27 09:23:01 sundr scsi: [ID 365881 kern.info] /pci@1f,700000/scsi@2 (mpt0):
Jan 27 09:23:01 sundr Log info 11070000 received for target 3.
Jan 27 09:23:01 sundr scsi_status=0, ioc_status=804b, scsi_state=8
Jan 27 09:23:01 sundr scsi: [ID 365881 kern.info] /pci@1f,700000/scsi@2 (mpt0):
Jan 27 09:23:01 sundr Log info 11070000 received for target 2.
Jan 27 09:23:01 sundr scsi_status=0, ioc_status=804b, scsi_state=8
Jan 27 09:23:01 sundr scsi: [ID 365881 kern.info] /pci@1f,700000/scsi@2 (mpt0):
Jan 27 09:23:01 sundr Log info 11070000 received for target 3.
Jan 27 09:23:01 sundr scsi_status=0, ioc_status=804b, scsi_state=8
Jan 27 09:23:01 sundr scsi: [ID 365881 kern.info] /pci@1f,700000/scsi@2 (mpt0):
Jan 27 09:23:01 sundr Log info 11070000 received for target 2.


Jan 27 09:23:01 sundr mpt_check_task_mgt: Task 4 failed. ioc status = 4a targe
t= 0
Jan 27 09:23:01 sundr md_stripe: [ID 641072 kern.warning] WARNING: md: d17: writ
e error on /dev/dsk/c1t3d0s3
Jan 27 09:23:01 sundr md_stripe: [ID 641072 kern.warning] WARNING: md: d16: writ
e error on /dev/dsk/c1t2d0s3
Jan 27 09:23:01 sundr last message repeated 1 time
Jan 27 09:23:01 sundr md_stripe: [ID 641072 kern.warning] WARNING: md: d17: writ
e error on /dev/dsk/c1t3d0s3
Jan 27 09:24:50 sundr genunix: [ID 540533 kern.notice] ^MSunOS Release 5.9 Versi
on Generic_118558-19 64-bit
Jan 27 09:24:50 sundr genunix: [ID 943905 kern.notice] Copyright 1983-2003 Sun M
icrosystems, Inc. All rights reserved.
Jan 27 09:24:50 sundr Use is subject to license terms.
Jan 27 09:24:50 sundr genunix: [ID 678236 kern.info] Ethernet address = 0:3:ba:6
4:ea:5f
Jan 27 09:24:50 sundr unix: [ID 389951 kern.info] mem = 8388608K (0x200000000)
Jan 27 09:24:50 sundr unix: [ID 930857 kern.info] avail mem = 8225464320
Jan 27 09:24:50 sundr rootnex: [ID 466748 kern.info] root nexus = Sun Fire V440
Jan 27 09:24:50 sundr rootnex: [ID 349649 kern.info] pcisch2 at root: SAFARI 0x1
e 0x600000





Jan 27 09:25:28 sundr unix: [ID 270833 kern.info] cpu3: UltraSPARC-IIIi (portid
3 impl 0x16 ver 0x24 clock 1062 MHz)
Jan 27 09:25:29 sundr unix: [ID 270833 kern.info] cpu0: UltraSPARC-IIIi (portid
0 impl 0x16 ver 0x24 clock 1062 MHz)
Jan 27 09:25:29 sundr unix: [ID 721127 kern.info] cpu 0 initialization complete
- online
Jan 27 09:25:29 sundr unix: [ID 270833 kern.info] cpu1: UltraSPARC-IIIi (portid
1 impl 0x16 ver 0x24 clock 1062 MHz)
Jan 27 09:25:29 sundr unix: [ID 721127 kern.info] cpu 1 initialization complete
- online
Jan 27 09:25:29 sundr unix: [ID 270833 kern.info] cpu2: UltraSPARC-IIIi (portid
2 impl 0x16 ver 0x24 clock 1062 MHz)
Jan 27 09:25:29 sundr unix: [ID 721127 kern.info] cpu 2 initialization complete
- online
Jan 27 09:25:29 sundr pcisch: [ID 370704 kern.info] PCI-device: usb@a, ohci0
Jan 27 09:25:29 sundr genunix: [ID 936769 kern.info] ohci0 is /pci@1e,600000/usb
@a
Jan 27 09:25:29 sundr pcisch: [ID 370704 kern.info] PCI-device: usb@b, ohci1
Jan 27 09:25:29 sundr genunix: [ID 936769 kern.info] ohci1 is /pci@1e,600000/usb
@b
Jan 27 09:25:32 sundr genunix: [ID 408822 kern.info] NOTICE: ce0: no fault exter
nal to device; service available
Jan 27 09:25:32 sundr genunix: [ID 611667 kern.info] NOTICE: ce0: xcvr addr:0x01
- link up 100 Mbps full duplex
Jan 27 09:25:32 sundr genunix: [ID 454863 kern.info] dump on /dev/md/dsk/d3 size
16386 MB
Jan 27 09:25:34 sundr pseudo: [ID 129642 kern.info] pseudo-device: devinfo0
Jan 27 09:25:34 sundr genunix: [ID 936769 kern.info] devinfo0 is /pseudo/devinfo
@0
Jan 27 13:03:08 sundr sshd[148]: [ID 800047 auth.crit] fatal: Timeout before aut
hentication for 10.1.195.254.
Jan 27 13:04:27 sundr sshd[150]: [ID 800047 auth.crit] fatal: Timeout before aut
hentication for 10.1.195.254.
Jan 27 13:05:46 sundr sshd[152]: [ID 800047 auth.crit] fatal: Timeout before aut
hentication for 10.1.195.254.
Jan 27 13:06:56 sundr sshd[157]: [ID 800047 auth.crit] fatal: Timeout before aut
hentication for 10.1.195.254.
Jan 27 13:08:02 sundr savecore: [ID 346688 auth.error] initial dump header corru
pt
Jan 27 13:08:02 sundr genunix: [ID 454863 kern.info] dump on /dev/dsk/c1t0d0s1 s
ize 16386 MB
Jan 27 13:08:02 sundr pseudo: [ID 129642 kern.info] pseudo-device: tod0
Jan 27 13:08:02 sundr genunix: [ID 936769 kern.info] tod0 is /pseudo/tod@0
Jan 27 13:08:03 sundr pseudo: [ID 129642 kern.info] pseudo-device: pm0
Jan 27 13:08:03 sundr genunix: [ID 936769 kern.info] pm0 is /pseudo/pm@0
Jan 27 13:08:03 sundr syslog: [ID 522582 daemon.notice] /usr/sbin/pmconfig: /etc
/power.conf line (18) failed to convert mount point /dev/md/dsk/d0 to prom name
Jan 27 13:08:03 sundr sendmail[347]: [ID 801593 mail.crit] NOQUEUE: SYSERR(root)
: /etc/mail/sendmail.cf: line 83: fileclass: cannot open '/etc/mail/local-host-n
ames': No such file or directory
Jan 27 13:08:08 sundr sshd[159]: [ID 800047 auth.crit] fatal: Timeout before aut
hentication for 10.1.195.254.
Jan 27 13:08:09 sundr sshd[496]: [ID 800047 auth.error] error: Bind to port 22 o
n :: failed: Address already in use.
Jan 27 13:08:09 sundr sshd[496]: [ID 800047 auth.crit] fatal: Cannot bind any ad
dress.
Jan 27 13:08:09 sundr pseudo: [ID 129642 kern.info] pseudo-device: vol0
Jan 27 13:08:09 sundr genunix: [ID 936769 kern.info] vol0 is /pseudo/vol@0
Jan 27 13:08:10 sundr pcisch: [ID 370704 kern.info] PCI-device: SUNW,XVR-100@2,
pfb0
Jan 27 13:08:10 sundr genunix: [ID 936769 kern.info] pfb0 is /pci@1e,600000/SUNW
,XVR-100@2
Jan 27 13:08:10 sundr pfb: [ID 604756 kern.info] pfb#0: 1280x1024, rev 5159.0
Jan 27 13:08:13 sundr pseudo: [ID 129642 kern.info] pseudo-device: devinfo0
Jan 27 13:08:13 sundr genunix: [ID 936769 kern.info] devinfo0 is /pseudo/devinfo
@0
Jan 27 13:09:28 sundr sshd[183]: [ID 800047 auth.crit] fatal: Timeout before aut
hentication for 10.1.195.254.
Jan 27 13:10:38 sundr sshd[185]: [ID 800047 auth.crit] fatal: Timeout before aut
hentication for 10.1.195.254.
Jan 27 13:11:49 sundr sshd[187]: [ID 800047 auth.crit] fatal: Timeout before aut
hentication for 10.1.195.254.
Jan 27 13:12:59 sundr sshd[189]: [ID 800047 auth.crit] fatal: Timeout before aut
hentication for 10.1.195.254.
Jan 27 13:14:17 sundr sshd[191]: [ID 800047 auth.crit] fatal: Timeout before aut
hentication for 10.1.195.254.
Jan 27 13:15:32 sundr sshd[193]: [ID 800047 auth.crit] fatal: Timeout before aut
hentication for 10.1.195.254.
Jan 27 13:16:45 sundr sshd[198]: [ID 800047 auth.crit] fatal: Timeout before aut
hentication for 10.1.195.254.







**********************************************************************************************
Jan 29 15:48:12 sundr SUNW,UltraSPARC-IIIi: [ID 895151 kern.info] [AFT2] E$Data
(0x30) 0x00000000.00000000 0x00000000.00000000 ECC 0x000
Jan 29 15:48:12 sundr SUNW,UltraSPARC-IIIi: [ID 929717 kern.info] [AFT2] D$ data
not available
Jan 29 15:48:12 sundr SUNW,UltraSPARC-IIIi: [ID 181091 kern.info] [AFT3] errID 0
x0000b22a.7f7ef78c Above Error detected by protected Kernel code
Jan 29 15:48:12 sundr that will try to clear error from system
Jan 29 15:48:12 sundr SUNW,UltraSPARC-IIIi: [ID 702111 kern.warning] WARNING: [A
FT1] Uncorrectable remote memory/cache (RUE) Event detected by CPU1 Privileged D
ata Access at TL=0, errID 0x0000b22a.806899b4
Jan 29 15:48:12 sundr AFSR 0x00100001<PRIV,RUE>.82000000<RCE> AFAR 0x0000002
0.0cf230a0
Jan 29 15:48:12 sundr Fault_PC 0x1027f6c J_REQ 2
Jan 29 15:48:12 sundr C2/P0/B0: B0/D0 B0/D1 (applicable only if correspondin
g FRU Event also logged)
Jan 29 15:48:12 sundr SUNW,UltraSPARC-IIIi: [ID 986054 kern.info] [AFT2] errID 0
x0000b22a.806899b4 E$tag PA=0x00000010.80023080 does not match AFAR=0x00000020.0
cf23080
**********************************************************************************************


**********************************************************************************************
(0x30) 0x840a8002.8778b401 0xc40fbf77.84108003 ECC 0x0d0
Jan 29 15:48:12 sundr SUNW,UltraSPARC-IIIi: [ID 929717 kern.info] [AFT2] D$ data
not available
Jan 29 15:48:12 sundr SUNW,UltraSPARC-IIIi: [ID 600490 kern.info] [AFT3] errID 0
x0000b22a.806899b4 Above Error detected by protected Kernel code
Jan 29 15:48:12 sundr that will try to clear error from system
Jan 29 15:48:12 sundr SUNW,UltraSPARC-IIIi: [ID 746938 kern.warning] WARNING: [A
FT1] Uncorrectable memory (FRU) Event detected by CPU2 at TL=0, errID 0x0000b22a
.8069384c
Jan 29 15:48:12 sundr AFSR 0x00000000.18000203<FRC,FRU> AFAR 0x00000020.0cf2
30a0 INVALID
Jan 29 15:48:12 sundr Fault_PC 0x10377d0 Esynd 0x0003 J_AID 1
Jan 29 15:48:12 sundr SUNW,UltraSPARC-IIIi: [ID 619886 kern.notice] [AFT1] errID
0x0000b22a.8069384c Two Bits in error, likely from WDU/WBP
Jan 29 15:48:12 sundr SUNW,UltraSPARC-IIIi: [ID 520982 kern.info] NOTICE: [AFT0]
Corrected remote memory/cache (RCE) Event detected by CPU1 at TL=0, errID 0x000
0b22a.806899b4
Jan 29 15:48:12 sundr AFSR 0x00100001<PRIV,RUE>.82000000<RCE> AFAR 0x0000002
0.0cf230a0 INVALID
Jan 29 15:48:12 sundr Fault_PC 0x1027f6c J_REQ 2 INVALID
**********************************************************************************************


Jan 30 03:48:12 sundr SUNW,UltraSPARC-IIIi: [ID 683671 kern.info] NOTICE: [AFT0]
Corrected remote memory/cache (RCE) Event detected by CPU3 at TL=0, errID 0x000
0d974.d83378a0

 
Jan 27 09:22:25 sundr unix: [ID 836849 kern.notice]
Jan 27 09:22:25 sundr ^Mpanic[cpu1]/thread=2a10000bd40:
Jan 27 09:22:25 sundr unix: [ID 261965 kern.notice] Fatal PCI UE Error


Hi All,

Please have a look at the new logs there it shows CPU1.
The system is untouched nothing is changed.

Seems there might be problem with memory... not sure.

Can any one help.

Thanks
Sunil D'souza
 
I'd be ringing Sun around about now; do you have a support contract?

Annihilannic.
 
Sunny,
Do you have the power save feature enabled... Found some information that stated certain types of memory didn't handle the Power Save feature very well and caused the workstation to Panic when the workstation was idle...

But, I agree with Annihilannic, If this is a new box with support, I'd be on the phone with Sun.
 
Thanks for your suggestion I have asked someone to speak with sun and I guess should be resolved.


Annihilannic and Bfitzmai Thank you Very much, I really appreciate your time.
 
sunny1504;

Make sure to let us know what they have you do.

Thanks

CA
 
thread60-1137920 may be of interest. This is not a CPU problem if similar to thread60-1137920
 
I had the same problem wtih My v440 with Solaris 8 seems to reboot unexpectly with no kernel error messages. Found that it was hardware issues some V440s; Please verify if it is hardware related and you get the following messages from alom;

Fatal Error Reset
SC Alert: Host System has Reset

Since I didn't bother upgrading the motherboard, I set ce1 as my primary interface. Hope this helps.




Sun(sm) Alert Notification

* Sun Alert ID: 101548 (formerly 57618)
* Synopsis: Sun Fire V440 and Netra 440 Systems Using a Specific
Networking Configuration may Unexpectedly Reset
* Category: Availability
*
Product: Sun Fire V440 Server, Netra 440 Server
* BugIDs: 5039862
* Avoidance: Hardware, Workaround
* State: Resolved
* Date Released: 12-Aug-2004
* Date Closed: 29-Sep-2005
* Date Modified: 14-Jan-2005, 10-Mar-2005, 29-Sep-2005

1. Impact

Under certain conditions using a specific network configuration, the Sun
Fire V440 or Netra 440 system may experience an unexpected reset and reboot.

2. Contributing Factors

This issue can occur in the following releases:

SPARC Platform

* Sun Fire V440
* Netra 440

This issue only occurs when there is system bus signal activity
coincident with a specific PCI bus signal activity occuring on the first
onboard Ethernet interface. Under Solaris this is typically logical
device "ce0", and physically this is the ethernet RJ45 connector NET 0.

3. Symptoms

If the described issue occurs, the system resets, and the following
error message appears on the console.

Fatal Error Reset
SC Alert: Host System has Reset

The system then reboots. No core files are generated, and the reset
output will not be logged to the "/var/adm/messages" file.

If it is suspected that the system is experiencing this issue, change
the OBP variables as follows to provide more verbose output in the event
of another occurrence.

Note: The OBP settings below are only recommended to verify whether the
system is experiencing this issue and should not be used long term. Once
the failure is verified, then the parameters should be set back to their
original values (make a note of these before changing). The settings
below provides more verbose output:

diag-switch? true
post-trigger none
obdiag-trigger none

When the parameters above are set, the error message will include some
additional information indicating the reset reason as "PBM FATAL", with
a PCI IO-Bridge register output similar to:

Fatal Error Reset
SC Alert: Host System has Reset

@(#)OBP 4.10.10 2003/08/29 06:25 Sun Fire V440
Clearing TLBs
Loading Configuration
Membase: 0000.0033.0000.0000
MemSize: 0000.0000.4000.0000
Init CPU arrays Done
Init E$ tags Done
Setup TLB Done
MMUs ON
Scrubbing Tomatillo tags... 0 1
Block Scrubbing Done
Find dropin, Copying Done, Size 0000.0000.0000.5ca0
PC = 0000.07ff.f000.4c88
PC = 0000.0000.0000.4d28
Find dropin, (copied), Decompressing Done, Size 0000.0000.0006.6700
ttya initialized
System Reset: (PBM FATAL)
JBUS-PCI bridge
JBUS-PCI bridge
slave Error Register: 8000000000001000


Solution Summary Top
4. Relief/Workaround

To work around the described issue, use the steps provided below:

1a) If the application only requires a single network port, use only the
second onboard Ethernet interface, net1 (ce1).

OR

1b) If the application requires multiple network ports, install a PCI
ethernet card in any available PCI slot. Choosing to place the card into
a 33MHz slot (Slot 0, 1 and 3) may lower performance relative to using
the card in a 66MHz slot (Slot 5, 2 or 4). Slot 5 is preferred.

2) It is highly recommended that to ensure the onboard net0 port (ce0)
is not accessed inadvertantly in a manner that could trigger this issue
(e.g. SunVTS), that the ce0 interface be completely disabled. It is also
recommended due to Solaris instance numbering, that this be done after
initial Solaris installation, to ensure net1 is assigned ce1 instance,
instead of ce0.

To completely disable onboard net0 (ce0) from the system, use the
following commands to install an NVRAM script at the OBP "ok" prompt:

ok nvedit
0: probe-all install-console banner
1: " /pci@1c,600000/network@2" $delete-device drop
2:
^C
Type "Ctrl-C" to exit nvedit as shown above. Then continue with:
ok nvstore
ok setenv use-nvramrc? true
use-nvramrc? = true
ok reset-all

After the system resets, net0 (ce0) should not be visible by OBP (i.e.
you should not see a path to net0 [/pci@1c,600000/network@2] when you
run "show-devs" from OBP). And the net0 (ce0) device should not be seen
by Solaris (e.g. prtconf or prtpicl commands).

Note: Additional information is available through normal support channels.


5. Resolution

Hardware remediation options are available. Please contact your local
Sun Services representative and reference this document.


Change History
14-Jan-2005:

* Updated Contributing Factors and Resolution sections

10-Mar-2005:

* Updated Impact and Relief/Workaround sections

29-Sep-2005:

* State: Resolved
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top