Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Mount problems with Quantum ATL tape library

Status
Not open for further replies.

Edmee

IS-IT--Management
Dec 4, 2000
115
0
0
AU
We recently installed a Quantum/ATL M2500 SDLT 220 Tape Library onto a Windows 2000 Server SP2. The backup software is ArcServe 2000 Advanced with SP4 installed. All the latests patches/SPs are installed for all hardware/software as far as we've been able to tell. The problem is that we often have mount problems with Arcserve, with all slots shown as dismounted. Mounting the slots results in the message " Mount failed. Library door open or blah-blah.." Only switching off the library and server seems to fix this problem. Halfway through a successful mount the following error message appears: "SCSCI LUN bad, ha index 1, device 0, lun 0". btw, our server is a DELL PowerEdge 2650, with the Adaptec 39160 SCSI card, which is supported by ARCServe. After a few minutes this LUN is recovered and all slots mount without any problems. There seems to be some kind of communication problem between the tape library and arcserve, but so far anything we've tried has not made an improvement. We have tried: installing the latest patches, switching scsi cards, switching channels, modifying the quantum iomgr.ini file... Any help would be very much appreciated. Please let me know if you need any more info.

Regards,

Edmee de Klerk
Fluor Australia
Perth, WA Edmee de Klerk
Technical Support
Fluor Australia
 
Check the ABSL order in Device Configuration. See thread at thread478-335389
 
Thanks for that. I will give it a go and let you know if I have any more problems.

Edmee Edmee de Klerk
Technical Support
Fluor Australia
 
Changing the ABSL order did not solve the problem. Still having exactly the same problems as mentioned before. Edmee de Klerk
Technical Support
Fluor Australia
 
Here is some background into the continuing problem we are experiencing in getting the Backup System to function continuously without errors.

After the Quantum ATL M2500 was installed it worked fine for a period of 2 weeks. The weeks following we began to experience the following;

Failure to mount - none of the slots mounted, other times most slots mounted with 2 or 3 dismounted slots. When trying to mount error message "Failure to mount, door open..." message appears.
We then noticed LUN errors: "SCSCI LUN Bad, ha index x, device x, LUN 0"
Would not dynamically update the device list when moving tapes, had to dismount and mount, often resulting in dismount problems mentioned above.
Success in the mount was intermittent
We contacted Craig Tamlin who advised it was possibly the Arcserve Software
We opened a Service ticket with Computer Associates.
CA asked SCSI card details. We advised from Windows 2000 device Manager that SCSI card was Adaptec 3960D. CA advised SCSI card not supported by Arcserve, they closed ticket.
We checked further, physical model is 39160 which is supported by Arcserve. According to Adaptec 39160 and 3960D are the same card. CA reopened ticket. Updated Adaptec driver, physical model now shows correctly in Device Manager.
We then contacted Simon Tippet as suggested by Craig Tamlin we were asked to change the cables, try different channels on the 2 SCSI cards.
We have changed the cables, tried both channels on a single card. Tried channel a from both the SCSI cards, then channel b, all with similar result. That is LUN errors and mounting issues.
We then applied the latest patches to the SCSI Card and to Arcserve, including SP4.
Made change to IOMGR.ini to attempt stopping the LUN errors. Changed tape busy toleration time to 360 sec, still errors appeared. Set it to 0 to disable error reports, errors still appeared. set to 3600sec. Error occurs 10 to 15 minutes during successful mount process. Also changed checksparetapes time to once a week at 5:45 am instead of once a day at 2:00 pm.
LUN errors still appeared, however the Library did not experience any mount or dismount issues for the next week.
ARCServe device manager reported unreadable media. Tapes are new.
Monday 2/09/02 - "Failure to mount, door is open.." after dismounting all slots and trying to mount. Turning off server and tape library is the only way to get around this problem.
Bar Code serial numbers on some tapes are not reported in the BLO??? way but have generic labeling as 1000001??, we are therefore unable to determine which tapes are which unless counting the slots, ie in ARCServe Friday backup is in slot 53, go to tape library and count slots until 53 is found. This is the only way to determine where a particular tape is.
Tuesday 3/9/02 More unreadable media errors. Changed ABSL (Adapter/Bus/SCSCI/LUN) order since post in forum said this would solve the unreadable media error. It seemed to make things worse. Now experiencing Windows NT SCSI Port errors in Arcserve, as well as "The device \Device\SCSI\ADPU160m2, did not respond within the time out period" errors in Event viewer's system log.
Error appeared on tape library: Error code 26C3: Drive 4 has not responded to multiple times to unload.
Drive statistics - drive fetches bad: 9 on Tape library.
Changed iomgr.ini tape busy toleration time back to 180.
Switched off server and tape library - slots not mounting.


We are desperate to get this working as you can imagine. Edmee de Klerk
Technical Support
Fluor Australia
 
Did CA request tape log debugging to be enabled, and at what level of detail? That should determine where the problem is (a communication issue, etc.).

Have you tried disabled any other service that may be trying to talk to the library (Removable Storage Service, any Dell or Quantum agents, the medium change/tape drives in CompMgmt)?

I'm not familiar with that equipment. Did it come pre-wired? Does it have a back-plane? Can you verify that the cabling is correct internally in the library?

Is the robot configured with the proper # of drives/slots (not always correct from factory)?

 
In answer to your questions:

Yes, Tape log debugging was enabled (detail - both screen and file). However, nothing much was determined from that. I will try it again today and see if we have any more luck.
The Quantum tape library (M2500)did not come pre-wired but was setup by Secure Data, a wellknown Storage consultancy company in Australia. I can't verify that the cabling is correct internally, and to be honest would not know how to test for that. The library was setup, wiring, drives/slots and all by people doing this for a living. We therefore assumed setup was correct. And the library performed brilliantly for the first 2 weeks, until the mount problems and scsi lun bad errors started..
I have disabled removable storage service and looked for and dell or quantum agents running but none of this fixed any of the problems.
As I said, I will try the tape debugging log again and post my results tomorrow
Edmée de Klerk
Technical Support
Fluor Australia
 
Were any changes made to the backups (adding a job, scheduling, etc) after the two weeks that would have caused the library to use more drives at the same time?

Can you determine the SCSI ID/LUN of each of the drives, and what the library thinks they are? (Drive 1=Lun 1, etc.)? Then make the ABSL order in ARCserve device config match that.

If it is still under warranty by SecureData, I would have a different tech come out to verify that everything is set up properly, and maybe provide the information above.
 
Had a look at the tape log. The following error keeps appearing: Logical Unit not ready, cause not reportable.
I have to agree with you that it is most likely a hardware error. Narrowing down where the error comes from exactly is going to have to be a process of elimination I guess... sigh.
Thanks for all your help! Edmée de Klerk
Technical Support
Fluor Australia
 
I am also getting the same error that you are getting. I am using a dell poweredge 2500 win2k sp3, arcserve 2k sp4, adic scalar 100 LTO w/ single drive, adaptec 39160/3960d. My backups are working and tapes are moving back and forth through the library without problems, but my backup speeds are not very fast 130Meg/min. How fast are you able to backup data and do you think this problem could affect backup speeed.
 
For Oboy: Try using the oldest MS Generic 1999 Adaptec 3960d drivers. This fixed my speed issue.

Edmee:
I also an M2500 but use Backup Exec and had a similar problem w/a M1500. Every time I inventoried an empty slot, it would produce an error code on the libary screen. I have another m1500 and it doesn't error out when inventory on empty slots happens. Only difference is the library firmware. Quantum updated it this week, got more errors about drives not initializing and they decided to exchange the entire library. It is scheduled to be replaced on Monday.
 
ATL bought a british company named M4 and here is where the M series is comming from. As always, when you buy something cheap(er) even if it says ATL on it, the quality is just not the same.
Anyway - I also think this is a hardware issue (albeit a wierd one). The Logical Unit not Ready/ Cause not reportable is not necessarily an error. If you would like, I could take a look at the tape.log that you sent CA.
In the meantime I suggest going in your SCSI card's bios and disable the domain feature and drop the transfer rate to something less than 160MB/s (try 40 for now).
Good luck
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top