SCSI Transport Failed / Disk not responding

angelo23 · Sep 19, 2005

Hello,

Need some assistance to error messages that keep poping up in the /var/adm/messages and the console....

Been getting the following messages....
------------------------------------------------------
Sep 18 16:00:00 a12 unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@4,0 (sd4):^M^M
Sep 18 16:00:00 a12 unix: SCSI transport failed: reason 'incomplete': retrying command^M^M
Sep 18 16:00:04 a12 unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@4,0 (sd4):^M^M
Sep 18 16:00:04 a12 unix: disk not responding to selection^M^M
Sep 18 16:00:04 a12 unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@4,0 (sd4):^M^M
Sep 18 16:00:04 a12 unix: disk not responding to selection^M^M
Sep 18 16:00:07 a12 unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@4,0 (sd4):^M^M
Sep 18 16:00:07 a12 unix: disk not responding to selection^M^M
Sep 18 16:00:10 a12 unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@4,0 (sd4):^M^M
Sep 18 16:03:12 a12 unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@4,0 (sd4):^M^M
Sep 18 16:03:12 a12 unix: disk not responding to selection^M^M
Sep 18 16:04:17 a12 unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000 (esp0):^M^M
Sep 18 16:04:17 a12 unix: Disconnected tagged cmds (1) timeout for Target 4.0
Sep 18 16:04:17 a12 unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@4,0 (sd4):^M^M
Sep 18 16:04:17 a12 unix: SCSI transport failed: reason 'reset': retrying command^M^M
Sep 18 16:04:17 a12 unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@4,0 (sd4):^M^M
Sep 18 16:04:17 a12 unix: SCSI transport failed: reason 'timeout': retrying command^M^M
Sep 18 16:04:20 a12 unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@4,0 (sd4):^M^M
Sep 18 16:04:20 a12 unix: Error for Command: write(10) Error Level: Retryable
Sep 18 16:04:20 a12 unix: Requested Block: 2163631 Error Block: 114964^M
Sep 18 16:04:20 a12 unix: Vendor: SEAGATE Serial Number: 9951A10112 ^M
Sep 18 16:04:20 a12 unix: Sense Key: Unit Attention^M
Sep 18 16:04:20 a12 unix: ASC: 0x29 (<vendor unique code 0x29>), ASCQ: 0x2, FRU: 0xcc^M
Sep 18 16:04:20 a12 unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@3,0 (sd3):^M^M
Sep 18 16:04:20 a12 unix: Error for Command: write(10) Error Level: Retryable
Sep 18 16:04:20 a12 unix: Requested Block: 4097536 Error Block: 4097536^M
Sep 18 16:04:20 a12 unix: Vendor: IBM Serial Number: 9825162766 ^M
Sep 18 16:04:20 a12 unix: Sense Key: Unit Attention^M
Sep 18 16:04:20 a12 unix: ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0^M
Sep 18 17:15:30 a12 unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@5,0 (sd5):^M^M
Sep 18 17:15:30 a12 unix: Error for Command: write Error Level: Retryable
Sep 18 17:15:30 a12 unix: Requested Block: 16 Error Block: 16^M
Sep 18 17:15:30 a12 unix: Vendor: SEAGATE Serial Number: 9711571955 ^M
Sep 18 17:15:30 a12 unix: Sense Key: Unit Attention^M
Sep 18 17:15:30 a12 unix: ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x2^M
Sep 18 19:15:02 a12 root: Solstice Backup Savegroup: (info) starting Default (with 1 client(s))

-------------------------------------------------------
These messages are appearing on our Sun SparcStation20 running solaris 2.6 (I 'm pretty sure).... I have a few internal drives and 2 external drives that are daisy chained along with a external CDROM drive... All of the external drives are using controller "0".... I replaced the SCSI cabels thinking they might be bad or something.... Contoller 0 is the one that resides on the main board....I got these same messages about a week ago also but after replacing the scsi cables and running fsck on the file systems I didn't receive any error messages at all during the week last week until this past weekend they started again... Any suggestions or ideas would be appreciated....

Thanks

marrow · Sep 19, 2005

It appears you have a disk problem - your fsck may have fixed but only on a temporary basis. Could run fsck again, I assume you rebooted afterwards
Also "iostat -En" is a useful command to check disk drives for unrecoverable errors

cndcadams · Sep 19, 2005

From looking at your errors I would be pointing to sd4 as your issue. look through all your previous messages files for any errors, generally the drive that first complained will be the issue unless it is the controller which in your case I do not think is the issue. sd3 and sd5 probably were effected by the bus resets caused by sd4.

do more /var/adm/messages.* | grep WARNING to get all errors from the drives. Also check dmesg to see if drives complained on boot up.

Also check iostat -En as Marrow pointed out.

thanks

CA

Annihilannic · Sep 19, 2005

Also try an 'analyze / read' in format on the disk in question, it will do a non-destructive analysis of the entire disk and attempt to relocate any bad blocks.

Annihilannic.

angelo23 · Sep 20, 2005

Hello All,

Thanks for all the suggestions / comments.... I did run fsck again and again and also rebooted... Every now and then when running fsck on root filesystem (/) and on /usr I seem to get the message "File System State in Superblock is Wrong: fix it"... I always answer "Yes" and run fsck again on those file systems and it usually goes through fine.... But it seems like every other time it pops up with that Superblock message...
I also did run the "iostat -En" command... But I did run this after running fsck and therefore it didn't really come up with any disk errors see below:
----------------------
c0t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE Product: ST31200W SUN1.05 Revision: 8724 Serial No: 00828554
RPM: 5400 Heads: 14 Size: 1.05GB <1051287552 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

c0t3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: IBM Product: DDRS34560SUN4.2G Revision: S98E Serial No: 9825162766
RPM: 7200 Heads: 16 Size: 4.29GB <4292075520 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

c0t4d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE Product: ST39103LCSUN9.0G Revision: 034A Serial No: 9951A10112
RPM: 7200 Heads: 27 Size: 9.06GB <9055065600 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

c0t5d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SEAGATE Product: ST19171W SUN9.0G Revision: 0776 Serial No: 9711571955
RPM: 7200 Heads: 27 Size: 9.06GB <9055065600 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

c0t6d0 Soft Errors: 0 Hard Errors: 3 Transport Errors: 0
Vendor: TOSHIBA Product: XM-4101TASUNSLCD Revision: 1755 Serial No: 06/24/95
RPM: 0 Heads: 0 Size: 18446744073.71GB <-8589934591 bytes>
Media Error: 0 Device Not Ready: 2 No Device: 1 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

rmt/0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: SUN Product: DLT4000 Revision: CD50 Serial No: QP
---------------------------------
I might have to wait to see when I get the SCSI Transport failed / Disk not responding messages again and then run the iostat -En command to see what happens..

I also ran the "analyse / read" command under "Format".. I ran it on sd4 and root... Came up with a "Total of '0' defective blocks repaired"....

Once again as of right now the server seems to be fine and not producing any more error messages.... But it did this last week also until the end of the week.... I''m sure it will do it again... Any more suggestions would be appreciated... I guess when it happens again which I'm sure it will I might just try replacing the sd4 hard drive with another one.... As I have very good backups from ufsdumps and Soltice networker...

Thanks

marrow · Sep 20, 2005

When I had a disk error recently I also kept running fsck but the error returned. Sun advised to umount the file system in question (don't fotget to run a tar or something similar first) then delete f/s & directories and recreate with a newfs) finally mount and restore contents.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

SCSI Transport Failed / Disk not responding

angelo23

Technical User

marrow

Technical User

cndcadams

IS-IT--Management

Annihilannic

MIS

angelo23

Technical User

marrow

Technical User

Similar threads

Part and Inventory Search

Sponsor