Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

L180 - Seagate LTO G1 drives - 2 new drive errors

Status
Not open for further replies.

andyman1070

IS-IT--Management
Jan 22, 2004
15
0
0
L180 with 5 Seagate LTO G1 Viper 220 SCSI drives in place for about a year. Also have Crossroads 3400 fibre-to-SCSI bridge in place. Backup software is TSM 5.1.5 and very stable on Windows 2000 Advanced Server. STK has been assisting for 8 weeks - all drives, cables, and bridge replaced. Attempted 3 differnt revs of firmware on drives and stepped library firmware down to 3.03.02(I think?). All obvious things also checked (HBA, fibre, etc.) Persistent I/O errors still continue and tapes frequently get stuck in drives. All suspect media has been removed from circulation. STK working with Seagate to determine two discovered issues: 1) LOCATE errors based on SENSE info 2) No end of marker on tape issue. This basically means that any random tape is getting stuck in any random drive during any READ activity (esp during TSM space reclamation). STK claims upcoming drive firmware will fix issue, but it's been killing us for several weeks.

Has anyone else seen this? If so, what was done to correct?

Advance thanks for your feedback! andyman1970
 
Have you tried upgrading the firmware on the Crossroads? This is normally where I have issues.
 
Yes, the Crossroads firmware has been upgraded.
 
Specific error from TSM activity log is:
ANR8302E I/O error on drive DRIVE5 (mt5.0.0.2) (OP=READ, Error Number=1117, CC=205, KEY=FF, ASC=FF, ASCQ=FF, SENSE=**NONE**, Description=SCSI adapter failure). Refer to Appendix D in the 'Messages' manual for recommended action.

This error can be consistently repeated by running space reclamation.
 
Hi Andyman,

Did you resolve this issue?. If you did could you please let me know what the problem was. We have the same configuration except we are using a IBM2108 router. The router was working well with IBM Gen1 drives, when we installed the L180 with Seagate LTO Gen1 drives we started getting drive and Library errors.
 
No, despite a recent firmware fix provided by Seagate via STK, this is still an issue.
 
howlec,
Since you are using a 2108 SAN DATA GATEWAY, have you flushed the database since adding your new drives to it?

Also, everytime you reboot the server you will have to go into the 2108 and flush the database so that it will remap all your devices correctly.

My advice to you would be get rid of it and use either a Crossroads or a STK SN SCSI/FIBRE bridge.
 
I've had similar probs using Seagate Viper drives and L180's, it's been a number of different issues on a number of sites.

* Replace cables
* Replace barcode stickers.
* Firmware revisions on both Library and drive.
* Firmware levels on Bridges and switch.
* Drive config within software (Multihosted drives in veritas, media servers seeing the drives in different orders after L180 rebooted).

More often than not the problem was being caused as the multihosted drives were not configured correctly.

Does your backup software have a robot test program (Like robtest in Veritas)?

When the config was incorect, i found when using robtest that what the robot thinks is Drive 1 and what the backup software think is drive 1 differ. Does that make sense? Difficult to explain...
 
I agree with comtec17. The IBM 2108 is OEMed by IBM from Pathlight Technology, your're better of with Crossroads routers.
The Storagetek L180 is a very reliable library. If you have Seagate LTO GEN 1 drives on the same interface as the Library, separate them for fault diagnosis. You will more than likely find the drive failures are affecting Library operation.
We have found Seagate drives to be the least reliable of the three LTO manufacturers. If possible get rid of the Seagate drives and replace them with IBM. You can fiddle with firmware levels as much as you like but at the end of the day the quickest fix is going to be drive replacement.
 
What brand of media are you using? Have you recently changed? The Seagate LTO-1 drives will only work reliably with certain brands of tape. Maxell and Fuji are fine, and are recommended by Seagate.

I had problems with Sony and EMTEC tapes - although it is possible that these companies could now be OEMing different brands. HP tapes were also fine, but these were quite obviously made by either Maxell or Fuji, I can't remember which.

IBM drives worked fine with the same tapes, even the same pieces that had previously failed with the Seagate drives.

This issue took up weeks of my time in 2002, the final resolution was to reject the Sony tapes and replace with Maxell and Fuji. Changing to IBM drives would also have worked. My environment was STK L700 and L180 with Legato NetWorker 6.1.2.

I know other people here have been suggesting all manner of SAN issues, but if these are media errors my reply is relevant.

 
Thanks to all who have added to this thread. After much nudging, STK and Seagate have identified two core issues with the Seagate Viper 220 firmware. One of those issues/errors forces STK to look at our TSM activity logs. The specific sense data will usually indicate a Locate error. Prior to our latest drive firmware update, this Locate error would timeout the communication to the drive, and Tivoli would then take the drive offline. We are still experiencing the issue, but the firmware has diminished the severity (i.e. tapes aren't getting stuck in drives anymore). We've also cycled out most/all of our suspect media - those tapes that were subjected to bad firmware in the past, effectively making them unusable. Interestingly enough, we still get the error on our newest tapes (Maxell LTO1). Avoid the Seagate drives at all costs as STK is not willing to provide timely fixes to these chronic issues.
 
Hi Andyman,

We were facing the same problem with our setup too some time back.We found the tapes were creating problem. Prior to the replacement of the tapes we have done all firmware upgrades, downgrades, changing cables, bridges, routers what not. We got a tool from STK support team, which isolated the problem. Be sure you use the Maxell media.

regards
SB
 
Hi..

The firmware v. 1603 correct this issue.... but all your tapes have to be re-labeled if want to work fine... if you cannot to do this... the problem disapear gradualy with this new firmware of STK for SGT LTO GEn 1 tape drives

 
STK has finally fessed up - the drive firmware is the root cause of the problem. Despite initial industry claims, LTO drives DO need to be cleaned on a regular schedule. After much back-and-forth with STK, we are now error free on a nightly basis.

** Warning: if you are in the Midwest US, watch your STK SE's closely. It's clear there's comptency issues which will only complicate the diagnosis of systemic problems **

Next time I'll go with an end-to-end IBM solution.

Thanks to all who provided feedback on this thread.
 
Lastly kudos to Tivoli Storage Manager for being such a robust product!! Despite all of the hardware failures we encounter and the millions of drive errors, we were always able to restore data on demand.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top