Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Media Errors 2

Status
Not open for further replies.

workingassets

IS-IT--Management
Apr 9, 2002
7
0
0
US
I am running a GFS rotation scheme and am backing up about 400 Gigs of data. The job starts, formats the tape finishs the first tape, formats the second tape (sequence #2) and then I get a variety of media errors.
E3855 - Unable to position media, E3719- Unable to write to media, E6096 - Media Error, E3712-unable to close session.

After the job fails I can reformat that second tape and rerun the remaining files - so the tape is not bad as the software says. If I try to rerun the job on the sequence #2 tape without a format the job will fail with a media error.

I am using a Compaq MSL5026. I have run the Library and tape tools, the firmware is up to date of both drives, all of the hardware tests have passed. I just ran SP3 on Brightstor.

Certainly 50% of my tapes cannot be bad. "The New HP" say's it's Computer Associates and they say it's the library and of course I am tearing my hair out.

Does anyone know how to fix this?

Thanks.

Nico
 
Media error is a class of SCSI Sense Code errors. There are many different type of media errors.

The next step is to find out what type of media error it is.

To do this enable the tape log and run the backup (see ARCserve FAQ if you need help to enable the tape log).

With BEB the write commands are not logged so the log will not be very large.

Here is an example of a failure to write media error.
18:42:21 =>ABSL:6050 [WRITE ] 0a 01 00 00 01 00 00 00 00 00
18:42:21 =>ABSL:6050 [WRITE ] 0a 01 00 00 01 00 00 00 00 00
<WRITE >, Sense Data as Follows:
SENSE ABSL:6050 f0 00 03 00 00 08 31 16 00 05 52 bc 0c
EX SENSE ABSL:6050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SENSE ABSL:6050 Media Error [03]
EX SENSE ABSL:6050 Write Error [0c, 00]

In this case Tape Engine issued a Write command, the drive failed to complete the command. Tape Engine requested the SCSI Sense Codes from the drive. The codes from the drive inform us that the drive failed to write. (I cut out the date/time from the begining of each line to make it fit a little better.)

So we need to see what those Sense Codes are in your case.
 
Here are the codes. Hopefully this will shed some light.

[2312] ABSL:2010 CMD:a Returning status[2]
[2312] 08/14 08:33:08 ABSL:2010 [REQUEST SENSE ] 03 00 00 00 40 00 00 00 00 00
[2312] <WRITE >, Sense Data as Follows:
[2312] SENSE ABSL:2010 f0 00 03 00 00 03 43 16 00 03 0e aa 0c
[2312] EX SENSE ABSL:2010 00 00 00 00 00 80 03 a3 00 00 10 65 01 91 9e b4
[2312] SENSE ABSL:2010 Media Error [03]
[2312] EX SENSE ABSL:2010 Write Error [0c, 00]
[2312] ABSL:2010 (SCSI WRITE)
[2312] (03) *** Medium Error ***
[2312] E6012 Write error - head sync error during write (0C 00)
[2312] E6092 Fatal - Uncorrectable error - E6092 Fatal - Uncorrectable error
[2172] 08/14 08:33:33 ABSL:2010 [REQUEST SENSE ] 03 00 00 00 20 00 00 00 00 00
[2172] SENSE ABSL:2010 70 00 00 00 00 00 00 16 00 00 00 00 00
[2172] EX SENSE ABSL:2010 00 00 00 00 00 80 03 a3 00 00 10 65 01 91 9e b4
[2172] DRV:2 Tape Remaining Capacity :105282256 KB
[2280] 08/14 08:33:33 ABSL:2010 [LOG SENSE ] 4d 00 72 00 00 00 00 00 58 00
[2280] LDN:2 ClientGetCompressionRatio: 1.660000
[2124] Calling DisConnectFromTape()
[2124] LDN:2 DisConnectFromTape
[2124] LDN:2 COMPAQ 100GB SDLT LDRV 3333 Log Sense Not Supported.
[2124] [ 2124 ]Registering tapedrv errors in database 6683968
[2124] [ 2124 ]Finished Registering tapedrv errors in database 6683984
[2124] [ 2124 ]Registering tape drive usage time in database 6683984
[2124] [ 2124 ]Finished Registering tape drive usage time in database 6684000
[2124] 08/14 08:33:33 ABSL:2010 [SPACE -1 FM ] 11 01 ff ff ff 00 00 00 58 00
[2124] ABSL:2010 CMD:11 Returning status[2]
[2124] 08/14 08:33:33 ABSL:2010 [REQUEST SENSE ] 03 00 00 00 40 00 00 00 58 00
[2124] <SPACE >, Sense Data as Follows:
[2124] SENSE ABSL:2010 f1 00 03 00 00 00 00 16 00 03 0e aa 15
[2124] EX SENSE ABSL:2010 02 00 00 00 00 80 03 a3 00 00 10 65 01 91 9e b4
[2124] SENSE ABSL:2010 Media Error [03]
[2124] EX SENSE ABSL:2010 Positioning Error Detected By Read Of Medium [15, 02]
[2124] ABSL:2010 (SCSI SPACE 16777215 FILE MARKS)
[2124] (03) *** Medium Error ***
[2124] E6024 Positioning error detected by read, space, or locate (15 02)
[2124] E6092 Fatal - Uncorrectable error - E6092 Fatal - Uncorrectable error
[2124] 08/14 08:33:33 ABSL:2010 [REWIND ] 01 00 00 00 00 00 00 00 58 00
[2124] ...in ChangerReturnTapeToSlot
[2124] ChangerReturnTapetoSlot(): Saving that we need to mark Slot 8 NOT in use later, tf:18002121
[2124] ...in ChgReturnTapeToSlot
[2124] LDN:2 Returning tape to Slot :8.
[2124] ...in ChgDriveElementStatus
[2124] ...in ChgReadElementStatus
[2124] LDN:1 Read Elem Stat:Starting Element [480] Buffer size [128] # of Element [1]
[2124] 08/14 08:34:03 ABSL:2000 [READ ELEMENT STAT ] b8 04 01 e0 00 01 00 00 00 80
[2124] szElementAddress [0x01][0xffffffe0] szSrcStEleAddr [0x00][0x28] chSIByte[0xffffff80]
[2124] SValid byte is valid, platter is inverted:0, slot address:40
[2124] ...in ChgMoveMedium
[2124] LDN:1 ChgMoveMedium(), from address:480 to address:40 Flip:0 FM:0.
[2124] LDN:1 Moving Tape from Library Drive LDN:2 to Slot:8
[2124] LDN:1 Leaving tape in drive, from slot address:40
[2124] ...in SaveChangerConfiguration
[2124] Successfully Flushed data to Changer Config file.
[2124] Successfully Flushed data to Changer Config file.
[2124] Successfully Flushed data to Changer Config file.
[2124] Successfully Flushed data to Changer Config file.
[2172] Calling DisConnectFromGroup:GROUP1
[2172] Calling DisConnectFromTape()
[2172] UnLockGroupEnum:GROUP1 NOT Clearing Sem! Already Done!
[2172] DRV:2 Cleaning information is NOT supported
[2172] ...in ChangerDisConnectFromDrive
[2172] ...in ChgReturnChangerDrive
[2172] LDN:1 Return Library Drive Associated Group:[GROUP1]
[2172] LDN:1 Return Library Drive Set drive LDN:2 with Group:[GROUP1]
[2172] LDN:1 Unlock LDN:2 Total number of free drives 2
[2172] ChangerDisConnectFromDrive(): Mark Slot 8 NOT in use, tf:10002101
[2172] UnLocking Group [GROUP1]
[2280] Calling DisConnectFromTape()
[2280] DestroyHandle [00CC0764]: Active Jobs 2
[2280] Total Jobs ACTIVE 3
[2280] CREATEJOBHANDLE [00CC0440]: JobType - 8, ServerType - Local, Client - ASMGR@GROVER
[1964] Calling DisConnectFromTape()
[1964] DestroyHandle [00CC0440]: Active Jobs 2
[1964] Total Jobs ACTIVE 3
[1964] CREATEJOBHANDLE [00CC0440]: JobType - 8, ServerType - Local, Client - ADMIN_STUB
[1964] Client Set DebugLevel: 0
 
I've seen this before and, despite of both the hardware and Windows didn't report any messages, this happened due to faulty drives. After replacing the drives everything worked out right.
Sometimes it helps to clean the drives or, to put in a brand new tape, let it read the media and take it out again.

regards

 
I agree with Cyklops.

Clean the heads, try a new tape and if that does not do it then it is probably a drive problem.

By the way since the sense codes are produced by the tape drive and just recorded by the Tape Engine this log can be shown to tech support for the drive. I have found that their level 1 support usually has no idea what the sense codes are but can send the log to an specialist who can read it.

For really tough problems some drives have the ability to provide a diagnostic dump containing a history of the drives usage. This can only be done with the help of their support, but is very effective at diagnosing elusive problems.
 
I have seen this problem before. We have two of the MSL5026 changers. We had CA involved as well as a Compaq Engineer that has been on site several times over the past two months. Our errors have been subsiding over the past couple of weeks, however we still have no firm solution or answers to our problems.

If you could send me an email (mrtoledo@email.com) with your case #'s with HP/Compaq. I would like to provide tickets to Compaq/HP of other people having the same types of issues. I have seen a good number of people on this board having these problems and HP insists there is not a problem with there units and CA insists there is not a problem with Arcserve/Brightstore. HP insists they do not have other users havng these types of problems. I would like to provide them with case numbers to help them understand the scope of the issue at hand.
 
This has been great. My problem still isn't solved, but at least I know that other people feeling the same frustration. I have sent this thread to the tech at HP that was working on my case.

Thanks.
 
Hi,

I've had the same problem 3 times now. I just had new drive installed (3rd one) and I'm still getting the mentioned errors. Even on brand new tapes. Exabyte has requested that I run the cleaning tape on the drive 10x and then see what happens.

 
We too have been having the same problems with Compaq, denying issues with the MSL5026 and Arcserve below are the call references. They have replaced everything apart from the library controller card.

If you get any admittance to this issue, and/or a resolution can you let me know
 
Just a follow-up that we are still having issues...only they have gotten worse. I have had two good backups in the past month. Every day I see failed backups with Error codes E3719 unrecoverable data error and E609 Media Error and E3855 unable to position media.

Everything drives and tapes pass the library and tape tools so Compaq says there is nothing wrong with the library. I have reinstalled brightstor just to make sure that there wasn't anything corrupt with the installation.

I am also getting the same type of errors with NTbackup. I am getting upable to write to media...bad data?

The problem is, is that the job does not fail in the same location everytime..

Any other thoughts?

 
I am having this issue, too, with a Dell Powervault 122T. It seems to be that this problem occurs on large files when my network is under stress. 99% of the time, it's during mailbox - brick level backups. Just happened last night, but I was also scanning mailboxes (after virus update) at the same time. I am convinced it's a network/timing issue - and brightstor software assumes it lost communications with the drive - so it's not the scsi card, the drive or the tapes themselves. I have a case open with Brighstor - the only advice I have so far, is to be on the latest patches - - which I am. The annoyance for me is I have to physically shut the drive off and back on and then boot the server. Which means if I am off-site, i have to come in and do the manual reset!
 
Check your SDLT firmware! We were having the same issue which was plaguing us for the past two months. We have 3 MSL5052s2 (SDLT320) and were getting the same &quot;Failed to close session&quot; errors and numerous media errors. We thought our drive firmware was up-to-date, however I noticed on the CA website that the certified firmware version for our drives was v75 (4B4B) and we had v52 (3434) installed. We recently updated the firmware and the problems have subsided. For some reason the v75 never showed up under the LT&T software (which was just recently updated to v 3.4). I manually ftped the .frm file and installed it in the firmware directory and performed the update. The release notes for v75 addressed many issues we were having and the HP tech &quot;HIGHLY&quot; recommended we upgrade as soon as possible.
One thing I've noticed is with the SDLT hardware is how picky it is about firmware.

Hope this helps.
 
A lot of people are reporting that they have seen the same errors.

Unless you get a tape log and compare it to the above there is no way to know if it is the &quot;same&quot; error or not.

I've said this many times already many failures even those that fall into the SCSI Media and Hardware error Sense Code range can be caused by tape and library device drivers loading. The HP/Compaq Storage Agents are another one that causes many failures.

So beware of basing your problem off the activity log. It is a high level log and can not tell you what is really going on. You have to view the tape log (as above) to see what is really going on.
 
Problem solved….

We had the same problem and HP gave us a few tricks to solve it.

Our environment is the following : Windows 2000 Servers, SP4 with all the drivers required. 2 Compaq TL892 (Mini Library) attached to a Modular Data Router (MDR). Our MDR is connected to a Compaq FC Switch to which is also connected our servers. We are now using Arcserve 9 but we also had some weird problems with version 8 (Arcserve 2000).

The way we usually solved this kind of problem was removing completely Arcserve from the server having the problem, reboot and re-install Arcserve with all the patchs.

I tried that kind of solution last week but it fixed only half of the problem… If I was using my second TL892, the backup was ok but using the first one gave the error E3719 that I had previously.

So here is the way we solved it. First I stopped the Windows Service called « Removable Storage » and I configured it « Manual » instead of « Automatic » on all our servers using Arcserve. Second, in the FC switches (we have 2) we defined many zones that acted like VLAN. In each zone, I put only 2 connections : the connection for the MDR and a connection for a server with Arcserve. We did that last week and all our backups for the weekend were ok.

I hope this help for everyone.
 
Hello
One way to get rid of this errors may be to disable the driver in the device manager. The tapes as well as the changer. BAB or BEB is able to work without the W2k driver.
Kind regards
Gery
 
I'm having simular problems with my MSL5026 on a DL380 running Windows 2003 and Backup Exec. The problems started when I at the same time replaced a drive and added new tapes to the Library. It took a while to sort out things but it turned out to be bad tapes! The original library came with Compaq tapes, out of aprox. 40 tapes we had 2 to 3 hard write errors. When we added 20 new tapes 10 out of the 20 had hard errors and would cause all kinds of strange errors on the MSL5026. At the root of most of the errors were CRC errors but we got everything form the drives going offline to tapes getting stuck in the drives. When I went back to the original tapes everything cleared up. I sent 9 of the bad tapes back to Quantum, just got them back and have the same problems again. Can't even get an erase to work on the new tapes. Since these are Quantum drives you would think that Quantum tapes would work the best, that doesn't seem to be the situation here. If anyone has any ideas I would love to here them
 
I had similar tape errors on the same Msl5026 and discovered I was using the compaq tape drivers. Arcserve sometimes have a problem with this. Disable the compaq drivers and see what happens.
 
Thanks,

Actually, the Quantum Drivers ahd been loaded. I'm not sure when that happened but when I relaoded Veritas drivers I was able to use all but one of the tapes.
 
I have this Issue some times befor. It is a driver issue from Adaptec (or from Compaq adaptec chip). What you must Do is replace the SCSI Driver with the newes from Apaptec not from Microsoft (internal Windows driver). When you have a compaq server update the CIM to version 7 on Filber systems.
disable the removebel storage manager and deactivate the tapes tools from Compaq. Normaly the replace of the driver is enave. but the old cim and the removebel storage manager disturb some times olso.

I hope it helps you.... I fixed so 10 Backup servers with this issue.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top