Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Disk health & error checking

Status
Not open for further replies.

hcclnoodles

IS-IT--Management
Jun 3, 2004
123
GB

hi there

Just a quick question, I have been getting various soft and hard errors being generated on disks on various machines (as im sure we all do). i was wondering other than checking "iostat -En" for the number and frequency of errors, and "prtdiag" for general hardware status, what else is there I can do to further investigate disk errors without taking the box down to do diagnostics (these boxes are live and downtime is almost impossible). Any additional tools or tips that would help me would be great. Eg: I have htis error generated on a sparc box

Error for Command: write(10) Error Level: Retryable
scsi: [ID 107833 kern.notice] Requested Block: 3603180 Error Block: 36031808
scsi: [ID 107833 kern.notice] Vendor: SEAGATE Serial Number: 0327A23Y8
scsi: [ID 107833 kern.notice] Sense Key: Hardware Error
scsi: [ID 107833 kern.notice] ASC: 0x19 (defect list error), ASCQ: 0x0, FRU: 0x2

To me this looks serious, and "iostat" is reporting 1 hard error, but where do I go from here, what additional tools can i use
 
hcclnoodles;

Really what you should be concerned about is the /var/adm/messages and messages.* also dmesg information. If you only see the error once or twice you can probably ignore the issue unless the system is crashing and the drive is a boot drive. I personally do not fool with boot drive errors.

If you are seeing the errors occuring over multiple days or bus resents do to the error on the drive I would replace it.

Sun has a tool called sunvts, but being this is a production box you would have to find a window when you could run it as it will be doing disk reads. You can also run format then choose the drive and the analyze then read to check the drive.

Also be aware I have seen drives pass Sunvts, format anaylze read, and fail a couple weeks later when using with Veritas.

You need to make sure that this drive error is not a symptom of a previous error. I generally will look 20 to 30 lines above the posted error to see if anything looks strange.

Thanks;

CA
 
Agree with CA, but the
these boxes are live and downtime is almost impossible
attitudes of some management seem to ignore the fact that if failure does occur because of lack of maintenance, then it becomes more of a problem to them!

I don't mind people who aren't what they seem. I just wish they'd make their mind up.

Alan Bennett.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top