Media emergency: could not position tape

Sunboxez · Dec 3, 2001

Hello All,
I am having problems with restores from tape using NW User GUI. During the recover the NW Admin shows::media emergency:could not position volume x to file y, record z
Follwed by:
media info: could not read record z on LTO Ultrium tape x

Sometimes the restore continues, other times the errors exceed the device max.consec.errors and the drive is disabled, and the restore hangs.

Orignally thought to be bad media, we've used brand new media from FUJI with diff lot/batch numbers also same problem. During the backup there are no SCSI timeouts, no msgs in the event logs, so why is restore such a problem
Legato thinks its the hardware, STK thinks its legato. I would really love to hear from someone else with NT/STK/LTO environment.

Server:Compaq Proliant DL380R 2xP3-933 512MB RAM 50GBHD
OS: NT4 Sp6a SCSI: 2x Adaptec-2940U2B LVD
Networker 6.1 for NT Build 186
Lib: StorageTek L40
Drives: x3 Seagate LTO Viper LVD Firmware:1400

You are all my last hope before mngmt ditches this backup solution

bommer · Dec 3, 2001

If you haven't already, I would have STK replace at least one of your tape drives and try a backup and restore from it. The only other thing left is cabling and connections. We don't have LTO drives, so I can't vouch for them, but our STK guys use a program called SCSI toolbox to check our DLT7000 drives for errors. If the drives are on Legato's compatibility list they should work with Networker.

Sunboxez · Dec 5, 2001

bommer,
We've had STK in to replace 2 drives and we still get the same msgs. On top of that we ran SCSI toolbox and tested 500KB and 2GB write-read w/verify on all 3 drives and that worked fine. Then, I went to seagate to get the tape diag tool for NT and tested all 3 drives @ 5MB and 100GB write-read all 3 passed. I'm running a test right now to see if all 3 drives can backup/restore using NTbackup. If all 3 drives pass this test also I have to believe this is a legato problem. Do you agree? All of the hardware I've listed is the the compatibility matrices for legato and STK and the only time we see tape errors is in legato.....I hear veritas has a nice solution

Thanx for your input

Guest_imported · Dec 6, 2001

This is a bug. call Legato Tech support

bommer · Dec 10, 2001

Sounds like Legato. Maybe Networker and LTO don't work well together. I don't know anything about the LTO drives. Wish I could suggest something else.

Sunboxez · Dec 13, 2001

It's definitely a legato problem. We now have the scanner command successfully reading every SSID that fails from the GUI using save set recover, recover, or directed recover. This means the data is valid so I can keep my job but the indexes are screwy. Lgto had me blow away the indexdb and mediadb and recreate, the problem still exsist after backup/recovery. Now the lgto specialists are on the case they're leaning towards physical tape blocksize issues, I wish this was worked out before they stamp certified on these backup solutions. BTW coming from DLT to LTO just think of LTO as a DLT that holds 200GB and writes @12-14MB/S from a setup/operational standpoint there's really no difference....except maybe that DLT works!

jgarmer · Dec 13, 2001

Have you tried to upgrade to 6.1.1?

joe

jgarmer · Dec 13, 2001

Check this out this was in the release notes for 6.1 apparently this may die on a verify of your backups as well.

Media Position Errors Encountered When Auto Media Verify Is Enabled
When you enable the Auto Media Verify option for a volume pool, the NetWorker
software verifies the data written to volumes from the pool during the save operation.
(NetWorker software reads a record of data written to the media and compares it to the
original record.) NetWorker software verifies the media when a volume becomes full
or is no longer needed for saving data.
To read previously written data, nsrmmd repositions the volume. However, nsrmmd
does not always locate the data on the first attempt.
The following messages might appear in NetWorker Administrator:
media warning: /dev/rmt2.1 moving: fsr 15: I/O error
media emergency: could not position jupiter.007 to file 44,
record 16
No action is required. NetWorker software continues to attempt to find the proper
position.
If the NetWorker software finds the correct position, then media verification succeeds
and a successful-completion message appears:

Sunboxez · Dec 13, 2001

I have seen that entry in tech dialog, but we're not using auto-media verify. In fact the lgto techs recommend against using the feature with any build. ( Quality huh?) They have also recommended upgrading to 6.1.1 they have me doing a few more tests first.

Sunboxez · Dec 21, 2001

Hello All,

******* F. Y. I *********

We've come to some form of resolution with this case. Legato has determined this to be a bug with 6.X this problem was thought to have been resolved in 5.X but has resurfaced. The temporary workaround is to disable enterprise or power edition networker by creating a file called 'noimmediate' with no file extension in the nsr\debug folder. This effectively stages all index entries to disk and writes all of the indexes after the backup group completes. You lose some BU performance due to increase I/O overhead, but this will get your backups and restores working.

Thanks for all the input - Larry

maglub · Jan 9, 2002

Thanks a lot!

I have the exact same problem on a Compaq proliant server with one Seagate LTO in a L20 library.

Wonder when Legato will release a patch for this.

I'll try the workaround.

Regards,
//magnus

Sunboxez · Jan 10, 2002

Don't thank me yet.....

We've found out a few days ago that the positioning error has resurfaced even while the noimmediate mode is in use. BTW don't forget to stop/start the services after creating the noimmediate file. Please let me know if you continue to see the problem. Also, could you please provide me with your server/SCSI controller specs? You're the first person to have the same issue as me (Hooray I'm not going crazy), so I would like to know as much about your experience as possible.

Rgds,

Larry

Sunboxez · Jan 10, 2002

Don't thank me yet.....

We've found out a few days ago that the positioning error has resurfaced even while the noimmediate mode is in use. BTW don't forget to stop/start the services after creating the noimmediate file. Please let me know if you continue to see the problem. Also, could you please provide me with your server/SCSI controller specs? You're the first person to have the same issue as me (Hooray I'm not going crazy), so I would like to know as much about your experience as possible.

Rgds,

Larry

tyoung · Jan 21, 2002

I've encountered the same problem running NW 6.1.1 on Solaris 8. In fact, I opened a case with Legato support on it in November. At that time, the engineer took a look at my st.conf and said that he thought some flags might be missing, so he suggested revising that file. Since then, I've continued to get the same error messages I did previously, on multiple NSR servers, so I just re-opened the case last week.

Sunboxez · Jan 21, 2002

tyoung,

Whats your hardware configuration? Are you using LTO drives?

tyoung · Feb 11, 2002

Sunboxez,
My hw conf is as follows:

Solaris 8 running on Sun E420R
ATL P3000 tape library (8 x DLT7000; 326 slots)
NetWorker 6.1.1.

JimTaylor · Feb 12, 2002

Using 6.0.1 problem starts as noted below in cut from daemon.log and results in drive being hung until a refresh is done. When we do fulls on the weekend one to three drives usually end up in this failed state.

02/10/02 18:43:31 nsrd: media warning: /dev/rmt/9mbn writing: Bad file number, at file 18 record 636
02/10/02 18:43:31 nsrd: media notice: 9840 tape 200260 on /dev/rmt/9mbn is full
02/10/02 18:43:31 nsrd: media notice: 9840 tape 200260 used 10 GB of 20 GB capacity
02/10/02 18:44:23 nsrd: media notice: Volume "200260" on device "/dev/rmt/9mbn": Block size is 32768 bytes not 262144 bytes. Verify the device configuration. Tape positioning by record is disabled.
02/10/02 18:45:31 nsrd: media warning: /dev/rmt/9mbn reading: fsr 630 read: Bad file number
02/10/02 18:45:31 nsrd: media emergency: could not position 200260 to file 18, record 632
02/10/02 18:46:43 nsrd: media warning: /dev/rmt/9mbn reading: fsr 630 read: Bad file number
02/10/02 18:46:43 nsrd: media emergency: could not position 200260 to file 18, record 632
02/10/02 18:47:51 nsrd: media warning: /dev/rmt/9mbn reading: fsr 630 read: Bad file number
02/10/02 18:47:51 nsrd: media emergency: could not position 200260 to file 18, record 632
02/10/02 18:47:51 nsrd: media warning: /dev/rmt/9mbn moving: fsf 18: Bad file number
02/10/02 18:49:06 nsrd: media warning: /dev/rmt/9mbn reading: fsr 630 read: Bad file number
02/10/02 18:49:07 nsrd: media emergency: could not position 200260 to file 18, record 632
02/10/02 18:49:07 nsrd: media warning: verification of volume "200260", volid 1725934081 failed, can not read record 632 of file 18 on 9840 tape 200260
02/10/02 18:49:07 nsrd: media notice: verification of volume "200260", volid 1725934081 failed, volume is being marked as full.

Sunboxez · Feb 12, 2002

Jim,

The difference we seem to have is that my blocksize in legato is 64K and all my drives are set to write with 64k blocksize. From your daemon log it looks like you have some tapes that have been formatted on another device/diff settings with 32K. Your 9840 tapes are supposed to be written in 256K blocksize, so any thing more or less might cause you problems. If legato support can get your device config back to 256K see if you still get the same positioning errors, if yes then we have the same problem.

When your setup was working pre-6.02 did you change your config, at the st.conf/jb_config/level? Did your blocksize change after you upgraded automatically? Or did the same tapes you used to use simply stop working? One thing for sure is that if your blocksize doesn't match then you filemarks will be disabled so you will see those positioning errors. Do all 3 of your drives write the same blocksize? You could try using:

mt -f /dev/rmt/9mbn stat | more
this should show your default and max blocksizes, if you mount a tape in the drive first and then use the command you should see the blocksize of the tape under the max blocksize or appended to the end of the output.

We did have issues with tapes in our solaris env (NW 5.5.1 bld115/ Storedge L3500 w(4) DLT7K drives) , but it was a problem with the st.conf. I think we had in proper spacing/ quotes/ double-quotes in the file. After we made the necessary changes we've only had problems in NT.

Sorry for so many questions but I'm trying to find where our problems are similar. Please let me know any info you find helpful, or any new scoops you come across.

Thanks,

Larry

ionix · Feb 14, 2002

I have the same problem with 6.1.1 (Solstice backup 6.1.Build.186) with an ADIC FastStor LTO (IBM drive in a 7-slot autochanger).

The networker config shows the blocksize as 64K, but the device shows 1024 (presumably bytes) in response to the "mtinfo" utility provided by alan@metadigm.co.uk (

http://www.metadigm.co.uk/support/tapediag.shtml).

This program suggests a st.conf entry for the drive, which has the correct blocksize encoded into it. However, although backups write with no errors, restores fail which attempting to seek to the correct record on the tape, which is not surprising if Networker is seeking to record 200 using (200x64K) rather than (200x1024b).

Testing this using "dd", the session below shows the results: (warning - this does not make pretty reading)

> csh -v testtape
echo "Reading 1024 byte blocks..."
Reading 1024 byte blocks...
dd if=/dev/rmt/0 bs=1024 of=/tmp/tapedump count=10
10+0 records in
10+0 records out
> ls -al /tmp/tapedump
-rw-r--r-- 1 pmurphy it 10240 Feb 14 15:25 /tmp/tapedump
> od -c /tmp/tapedump | head -15
0000000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*
0000160 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 006 \0 001 \0 \0
0000200 \0 \0 \0 005 h 025 \0 001 < h 025 \0 \0 003 \0 \0
0000220 \n \n 013 001 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 001 H
0000240 \0 \0 \0 002 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000260 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000300 \0 \0 \0 8 \0 007 004 ` \0 \0 \0 \0 < h 025 \0
0000320 \0 \0 \0 \0 @ * | \0 \0 001 \0 \0 \0 \0 \0 005
0000340 h 025 \0 001 < h 025 \0 \0 003 \0 \0 \n \n 013 001
0000360 \0 \0 \0 \b f i j i . 0 0 1 \0 \0 \0 \0
0000400 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000420 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 , \0 \0 \0 001
0000440 \0 \0 \0 013 v o l u m e p o o l \0
0000460 \0 \0 \0 001 \0 \0 \0 007 D e f a u l t \0
>rm /tmp/tapedump

>echo "Reading 64K blocks..."
Reading 64K blocks...
>dd if=/dev/rmt/0 bs=65536 of=/tmp/tapedump count=10
0+1 records in
0+1 records out
>ls -al /tmp/tapedump
-rw-r--r-- 1 pmurphy it 32768 Feb 14 15:25 /tmp/tapedump
>od -c /tmp/tapedump | head -15
0000000 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*
0000160 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 006 \0 001 \0 \0
0000200 \0 \0 \0 005 h 025 \0 001 < h 025 \0 \0 003 \0 \0
0000220 \n \n 013 001 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 001 H
0000240 \0 \0 \0 002 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000260 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000300 \0 \0 \0 8 \0 007 004 ` \0 \0 \0 \0 < h 025 \0
0000320 \0 \0 \0 \0 @ * | \0 \0 001 \0 \0 \0 \0 \0 005
0000340 h 025 \0 001 < h 025 \0 \0 003 \0 \0 \n \n 013 001
0000360 \0 \0 \0 \b f i j i . 0 0 1 \0 \0 \0 \0
0000400 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000420 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 , \0 \0 \0 001
0000440 \0 \0 \0 013 v o l u m e p o o l \0
0000460 \0 \0 \0 001 \0 \0 \0 007 D e f a u l t \0

In summary, you can read 1K blocks and it works. If you try to read 64K blocks, it bombs out after 32K and reports an incomplete block. The data returned is the same. Therefore, Networker is at fault for attempting to read 64K blocks from a device which uses a 1K block size. :-{

Paul.

Sunboxez · Feb 14, 2002

Ionix,

Are you able to use the native mt command to compare the results with mtinfo? Is so I would like to know what block size Networker thinks its using on those drives. If you can run mt -f /dev/rmt/Xcbn stat | more and see what numbers you get with a tape mounted in the drive. I thought the 64K blocksize is standard for all LTO Ultrium drives. Do you know the specific model of your IBM drives? Has legato supportbeen of any help?

Please advise,

Larry

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Media emergency: could not position tape

MIS

MIS

MIS

New member

MIS

MIS

MIS

MIS

MIS

MIS

Technical User

MIS

MIS

IS-IT--Management

MIS

IS-IT--Management

IS-IT--Management

MIS

IS-IT--Management

MIS

Similar threads

Log in

Part and Inventory Search

Sponsor