Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

tapes come back as unrecognizeable media

Status
Not open for further replies.

kodiakid

IS-IT--Management
Aug 30, 2005
20
US
this problem has been going on for over a month. My customer is in a HP SAN env. that hooks up to a STK L40 via fibre/scsi router. the tape drives(2) are all certance lto. library and drives are all on the latest firmware.

so many things have gone wrong i have no idea where to begin. last night when the backups were ran only one tape drive was being used and about ten tapes came back as unrecognizeable media. this has been an ongoing problem. before that issue they were getting NT SCSI port errors. I have replaced every tape drive in the library, scsi cables, ran full diags along the fibre to router to scsi to tape/library chain and i can not recreate the error. at one point they were running okay. they could get backups with two tape drives but could not restore with one tape drive. CA ran some tests with them and thought that it might be the tape drive, even though this drive has passed full diags. to accomodate CA i replaced the drive and as such last night they get all the unreadable media errors. the library has no errors what so ever. the router has no errors. i've disabled all the hp services and disabled the drivers in windows.

i am out of ideas please help.
 
I feel your pain. CAI has left me high as dry many a time

1. This may seem very simplistic, but try to clean each tape unit no less than twice.

2. Try installing another backup solution eg vertias that will give you full functionality for a 30 day trail period. That will really determine if the problem is software based. If it is you have all rights to call CAI and escalate the issue as hi as possible.

3. Try upgrading to the lasest version of arcserve if you haven't already.

let me know if this helps
 
1. having user clean the tape drives.
2. customer does not want to do this just yet. escalated issue as high as we can at CA.
3. on the latest version with all patches.
 
in the \log directory

I would also suggest running through the suggestions in this document if you have not already - it is pretty comprehensive:


Also check the media brand being used with the drive I have seen some pretty strang things happen with the STK libraries when you don't use media off their recommended list (have no idea why though).

Have you taken a look at the tape drive logs using the STK software? Were any errors logged (hard or soft) logged against any of the drives?
 
i have run full diags on the all the hardware using scsi toolbox. and looked at all the logs. the l40 has four slots available for tape drives. the top two were occupied at first and the top drive was failing. so i moved the top drive to the third slot and now the fibre/scsi router was not able to detect both drives. i switched out all cables and still nothing. this lead me to believe that slot two, though the drive there was functioning normally, was shorting something out. so i moved both drives to slots 3 and 4. i ran two test backups approx. 3GB and 4GB both ran fine. so as of right now i would say the scsi port on slot two, that is either a data out or termination port is faulty and disrupting the entire scsi chain. if the restores go well and the backups run fine over night i would put the blame on the scsi port.

i forgot to pull the tape.log
 
Moving the drives about in the library would normally require that you re-run the library configuration (nothing to do with ARCserve) and then re-run ARCserve device configuration afterwards again - this might explain why the bridge was not able to detect the drive.

Also, some libraries expect the first drive bay to always be filled and it may cause other issues if it is not.
 
correct you are! i did reinit the library and the bridge and had the bridge probe the scsi buses again. sorry i didn't make that clear in preceding post. using slots 2 and 3 the bridge could still not see the drive in slot 3. then when i moved the drives to slots 3 and 4 the bridge was able to see everything perfectly fine. ran device config and everything was fine.

i know the i've seen the stk 9714 have issues without populating bays correctly but not the L-series. i appreciate the help though. if you got any more suggestions please keep them coming :)
 
last night my customer was able to backup 21 of 23 servers successfully. the two that failed got 'NT SCSI Port' errors.
 
Are all of these being backed up through the same server across the network, or each of them backed up locally across the SAN?

I would be basically looking at all the servers you have participating in the SAN and make sure configuration (at both OS level as well as ARCserve) is the same. This is particularly important for the tape drive drivers in the OS as for some OS level tape drivers, it only takes one machine connected to the SAN to have the drivers loaded for it to reserve the drive and cause problems with all other machines trying to use those resources.

Are the machines you are backing up, all at the same build/version? Do you get a network timeout at the point of failure as well as the SCSI errors? Do you get any indications of any ARCserve database problems ?

You should also try (certainly as a test) - quick erase plus'ing your tapes before backup (especially if you have problems restoring or reading the tape header on restore), just incase any previous configuration issues have caused the tape header to become bad in some way (just formatting the tape won't erase the tape header).

Are any of the servers participatingin the SAN communicating through a firewall at all?

BTW the problems with drive bays is not vendor specific - it can happen on a lot of different models.
 
Can I ask you a question that seems silly. But did this start right after a microsoft update? I've been having this problem with unrecognized media for a while. We use BEB 10.5 in a SAN environment and we have 2 libraries stk L40 and L80. I also noticed two other errors in the brightstor log E3719 and E604 the later which indicates some sort of scsi sense error. What I aslo notice is that my backups fail at the same time everyday like clock work thus causing all the media that were running backups to become unrecognized media. Here's what I've been noticing. Everytime there's some sort of microsoft update these errors pop up again. The way i've gotten around it is once all this updates are applied to all my servers that have brightstor or brightstor agents I make sure all of these servers are rebooted within one or two nights. Then the error goes away until the updates are applied again. It's a theory it sounds funny but that is the way i've fixed it. No matter what patch CA has given me has fixed the unrecognized media problem. Just rebooting all servers that have bightstor after an MS patch update or security update has solved it for me. Again solves it until the next MS update or security patch.
 
the unrecognized media error we corrrected and attribute to a patch. which one i forget exactly b/c we put so many on. the scsi error that occurs at the same time we had as well. one of the servers is querying the library. which one we do not know yet. we don't have a SAN sniffer available. so we have to parse the logs on the SAN switches and see if we can determine which hba is giving us the trouble. the MS update problem you're having we have the same problem as well. after a patch would be applied some services we disabled would reenable. also pre ms updates we would disable the library and drives entirely in device manager(CA uses it's own drivers) and it would reenable. there are like 30 servers in the SAN if this happens to just one of them, it causes everything or most of to fail.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top