Help regarding BLANK TAPE message needed....

Freeman99 · May 11, 2007

HAS THIS EVER HAPPENED TO YOU? YOUR BACKUP JOB ASKS FOR NEW BLANK TAPE WHEN IT IS NOT ACTUALLY REQUIRED TO COMPLETE THE JOB – but you don’t know why it’s not selecting the next tape in the Media Pool that DOES have room on it???

If it has, PLEASE LET ME KNOW ASAP so we can pool our requests to support to ask developers to fix the bug in the code.

Details:

This happens when using a Media Pool and the job selects an ALMOST FULL tape in the existing Media Pool. When it tries to span the job to the next tape, it requires a tape with the same random ID, next sequence #. If that tape cannot be found, it asks for a new, blank tape. If there is a new, blank tape, the job completes on that tape. Then, (here’s the kicker) the next job starts and runs into the same issue. This software bug results in EVERY JOB using that Media Pool to request a NEW, BLANK TAPE when it tries to span from the first, almost full tape in the Media Pool. So you end up with a bunch of tapes with only a small amount of data rather than using the tapes to capacity before selecting the next blank tape. Huge problem! Wasted thousands of dollars for us!

It cannot use a tape in the same Media Pool with room on it UNLESS it has the same random ID, next sequence#. It is at the point that a job pulls a blank tape to use that the random ID is assigned. The user has no control over random ID, nor can we tell the job to use any random ID tape in the Media Pool.

Interestingly, when we pull the almost full tapes out of the library and just leave in tapes that have some data, but not nearing capacity, in the Media Pool, all the jobs WILL complete with NO ISSUES – using non-blank tapes with different random IDs but in the same Media Pool. It is only when a job spans that it requires the same random ID. So if you’ve got a couple of almost full tapes in the front (first slots) of your Media Pool, and no blank tapes behind, the jobs will REQUIRE NEW BLANK TAPES even though you may have loads of free GB’s of room on tapes sitting in the library in the same Media Pool.

Below I’ve copied the email I sent to support regarding an issue with CA Arcserve Brightstor backup 11.5 sp2. According to support, neither sp3 nor the new version 12 will fix the issue we’ve run into with a bug in the software. Support will not send a request for change on to developers until he has “researched” the issue to see if anyone else has experienced the issue.

Because support was unable to figure out the issue at all (I figured it out and told them what the problem was and why it occurred) my guess is that even if anyone else has had the problem, they might not have known it was THIS ISSUE THAT CAUSED IT. Support would not have figured it out and identified it as a software problem since support told me that what happened on my jobs is just “how the software works” – the usual “by design” answer you get when there is a bug. Right now, the feedback I’m hearing is that it is not a significant enough issue to warrant developer time and budget to address.

So please help if you can.
mfreeman@anl.gov

HERE IS AN EXCERPT FROM ONE OF MY EMAILS TO CA SUPPORT THAT PINPOINTS THE ISSUE MORE:

Hello,

Wednesday night had the same issue with spanning, so most jobs failed. So yesterday, I pulled all tapes in the media pool with over 500gb data on them out of the library. So last night, all backups ran successfully again.

What is the point at which arcserve brightstor says a tape is full and should not be written to anymore?

Is it a set number out of 500gb, or a percentage, or does it compare the estimate of the job size with the space it "sees" as available on the tape, or what? How is that evaluated by the software to know when to stop using a particular tape within the media pool? This would help us to know if and when we would need to pull tapes to prevent this issue from happening again rather than reacting to failed jobs. It still requires some babysitting, but at least we have a workaround so that we don't waste so many tapes.

Had I known what the issue really was, I never would have put in the 16 blank tapes. Rather, I would have pulled the full tapes in the media pool so that the jobs would complete on the normal number of tapes - the incrementals usually hit 2 tapes per night. But because neither you nor I knew that the software was hitting an almost full tape and requiring a new blank tape to complete _each_ job that hit it, we ended up with 16 tapes with only a small amount of data on each for each of the jobs that spanned off of the almost full tape.

We wasted 16 tapes @ $135.00 each = $2,160.00. So rather than throwing away the $2k or losing the ability to recover data for that period, we now have to extend the time before the next full backup is run so that the tapes can be used efficiently. This is not an ideal way to operate our business.

Thanks for answering the above questions. It will help us to limp along with the software until a change can be made that will address the issue.

vschumpy · May 12, 2007

It's really hard to see what the problem might be, being as the media use rules depend on the type of rotation that you use. For example the media rules in global options have zero effect if you are running any rotation - they are hard-set by the rotation itself.

Media pool issues are notoriously difficult to understand simply because there are so many combinations and options that can be set (or not as the case may be).

When you talk about almost full tapes I presume your hardware supports the media remaining capacity SCSI sense, and that is what you are basing your argument on, and you are not simply working on a compression ratio guesstimate of what the tape should hold.

A couple of other things to bear in mind are that with 11.5 when a tape hits a span condition that it needs to go to another tape, it will dump the session and tape catalog information to date (if memory serves me correctly) at the beginning of the next point it writes on the next tape, and at the end of the previous tape. The catalog files can be several hundred Mb in size, so if you're talking a few Gb of free space left on a span tape, before the tape writes any new data, it has to dump the existing catalog information, to the new tape, and allow at least the same amount again for the end of that session. The catalog information is significantly larger if you are using multiplexing, and/or the exchange document level agent.

I am sure you feel you have done this to death and for sure in your mind you perceive it as a bug. In my experience if you try to ram it down their throat as a definitive bug, then you're unlikely to get anywhere fast. From what you are saying they are considering as a change to the product design, and usually this means an enhancement to the next product, and you're highly unlikely to see the change to the behaviour you perceive as a 'bug' in the current version.

Freeman99 · May 14, 2007

Thanks, vschumpy.

We have all custom jobs because we have to keep all data "forever". So we don't do GFS or any other canned rotation from the ca sw. We have not set up any custom settings with the media pools - all still have default settings. Is there is a setting there that would help resolve this? We looked, but it wasn't clear to us if a setting there would fix it.

Hardware is 2 libraries off 2 scsi cards in the server, both libraries are Qualstar RLS-5244 (44 slots for 500gb sait tapes). We asked Qualstar about your question and Qualstar tech support guy is contacting sony (drives) to ask if a mic chip is used and if not, what is used. I will post back again with this info when it comes.

We are using multiplexing, but no exchange backups of any kind on this server.

I also posted this to a CA-sponsored user group and did get a response from a tech there with some additional questions. Not sure if that means there is hope for a "redesign" or "enhancement", or maybe we just need help with some setting?

Thanks again.

Freeman99 · May 14, 2007

Qualstar said that Sony replied with the following:

The application can send Log Sense (page 31h) to check for the tape capacity. To prevent it from overwrite the previous written data, the application needs to write-append (space to end-of-data) first before it starts to write.

Does that answer the question you had about hardware?

Have not yet heard back today from CA on the user group forum. Any suggestions welcome.

We have large amounts of fairly stangnant data. Full backups run approximately once a quarter with daily (m-f) incrementals in between the fulls.

The same issue happened again last night with 2 backups failing and the request to load blank media. So I pulled the 2 "almost full" tapes today and expect all backups will run fine again until the next 2 tapes in the first slots are "almost full" and jobs fail again. It is not the most reliable way to monitor backups - waiting until some jobs fail indicating the tapes are almost full and need to be pulled.

Thanks again. If there is a media pool setting that will prevent this from happening please let us know.

Thanks.

bobarc · May 23, 2007

This may have been addressed, but on my server, occasionally, the sequenced tapes have a different expiration date than the first. Any chance the tapes have expired?

Freeman99 · May 23, 2007

Update: CA support kicked this up a level and at the same time, CA development responded to the CA users group post. The conclusion is:
1. My custom jobs that have worked for a year and a half exactly as we wanted should never have worked according to support. Apparently, the software should not have allowed us to set up jobs to use tapes this way. So they cannot fix something that was never supposed to work that way. Instead, they'll probably "fix" the sw so it won't work any more the way we were using it. Shame. The way we had it allowed us control over tapes and jobs in a way that the software "as designed" just cannot.
2. I'll have to change my setup to use the software "as designed" which means customizing rotation jobs instead of creating custom jobs that do what I want. The rotation jobs will have to be customized on an ongoing basis since no rotation jobs are set up to perform backups on the schedule we actually use. Quite a disappointment. The support person was quite helpful in discussing possible options - they work, but only with a lot of extra, ongoing maintenance and nothing like the control I had with the custom jobs.

So we can close this out.

Thanks for all of your responses!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Help regarding BLANK TAPE message needed....

Freeman99

MIS

vschumpy

Instructor

Freeman99

MIS

Freeman99

MIS

bobarc

IS-IT--Management

Freeman99

MIS

Similar threads

Part and Inventory Search

Sponsor