HAS THIS EVER HAPPENED TO YOU? YOUR BACKUP JOB ASKS FOR NEW BLANK TAPE WHEN IT IS NOT ACTUALLY REQUIRED TO COMPLETE THE JOB – but you don’t know why it’s not selecting the next tape in the Media Pool that DOES have room on it???
If it has, PLEASE LET ME KNOW ASAP so we can pool our requests to support to ask developers to fix the bug in the code.
Details:
This happens when using a Media Pool and the job selects an ALMOST FULL tape in the existing Media Pool. When it tries to span the job to the next tape, it requires a tape with the same random ID, next sequence #. If that tape cannot be found, it asks for a new, blank tape. If there is a new, blank tape, the job completes on that tape. Then, (here’s the kicker) the next job starts and runs into the same issue. This software bug results in EVERY JOB using that Media Pool to request a NEW, BLANK TAPE when it tries to span from the first, almost full tape in the Media Pool. So you end up with a bunch of tapes with only a small amount of data rather than using the tapes to capacity before selecting the next blank tape. Huge problem! Wasted thousands of dollars for us!
It cannot use a tape in the same Media Pool with room on it UNLESS it has the same random ID, next sequence#. It is at the point that a job pulls a blank tape to use that the random ID is assigned. The user has no control over random ID, nor can we tell the job to use any random ID tape in the Media Pool.
Interestingly, when we pull the almost full tapes out of the library and just leave in tapes that have some data, but not nearing capacity, in the Media Pool, all the jobs WILL complete with NO ISSUES – using non-blank tapes with different random IDs but in the same Media Pool. It is only when a job spans that it requires the same random ID. So if you’ve got a couple of almost full tapes in the front (first slots) of your Media Pool, and no blank tapes behind, the jobs will REQUIRE NEW BLANK TAPES even though you may have loads of free GB’s of room on tapes sitting in the library in the same Media Pool.
Below I’ve copied the email I sent to support regarding an issue with CA Arcserve Brightstor backup 11.5 sp2. According to support, neither sp3 nor the new version 12 will fix the issue we’ve run into with a bug in the software. Support will not send a request for change on to developers until he has “researched” the issue to see if anyone else has experienced the issue.
Because support was unable to figure out the issue at all (I figured it out and told them what the problem was and why it occurred) my guess is that even if anyone else has had the problem, they might not have known it was THIS ISSUE THAT CAUSED IT. Support would not have figured it out and identified it as a software problem since support told me that what happened on my jobs is just “how the software works” – the usual “by design” answer you get when there is a bug. Right now, the feedback I’m hearing is that it is not a significant enough issue to warrant developer time and budget to address.
So please help if you can.
mfreeman@anl.gov
HERE IS AN EXCERPT FROM ONE OF MY EMAILS TO CA SUPPORT THAT PINPOINTS THE ISSUE MORE:
Hello,
Wednesday night had the same issue with spanning, so most jobs failed. So yesterday, I pulled all tapes in the media pool with over 500gb data on them out of the library. So last night, all backups ran successfully again.
What is the point at which arcserve brightstor says a tape is full and should not be written to anymore?
Is it a set number out of 500gb, or a percentage, or does it compare the estimate of the job size with the space it "sees" as available on the tape, or what? How is that evaluated by the software to know when to stop using a particular tape within the media pool? This would help us to know if and when we would need to pull tapes to prevent this issue from happening again rather than reacting to failed jobs. It still requires some babysitting, but at least we have a workaround so that we don't waste so many tapes.
Had I known what the issue really was, I never would have put in the 16 blank tapes. Rather, I would have pulled the full tapes in the media pool so that the jobs would complete on the normal number of tapes - the incrementals usually hit 2 tapes per night. But because neither you nor I knew that the software was hitting an almost full tape and requiring a new blank tape to complete _each_ job that hit it, we ended up with 16 tapes with only a small amount of data on each for each of the jobs that spanned off of the almost full tape.
We wasted 16 tapes @ $135.00 each = $2,160.00. So rather than throwing away the $2k or losing the ability to recover data for that period, we now have to extend the time before the next full backup is run so that the tapes can be used efficiently. This is not an ideal way to operate our business.
Thanks for answering the above questions. It will help us to limp along with the software until a change can be made that will address the issue.
If it has, PLEASE LET ME KNOW ASAP so we can pool our requests to support to ask developers to fix the bug in the code.
Details:
This happens when using a Media Pool and the job selects an ALMOST FULL tape in the existing Media Pool. When it tries to span the job to the next tape, it requires a tape with the same random ID, next sequence #. If that tape cannot be found, it asks for a new, blank tape. If there is a new, blank tape, the job completes on that tape. Then, (here’s the kicker) the next job starts and runs into the same issue. This software bug results in EVERY JOB using that Media Pool to request a NEW, BLANK TAPE when it tries to span from the first, almost full tape in the Media Pool. So you end up with a bunch of tapes with only a small amount of data rather than using the tapes to capacity before selecting the next blank tape. Huge problem! Wasted thousands of dollars for us!
It cannot use a tape in the same Media Pool with room on it UNLESS it has the same random ID, next sequence#. It is at the point that a job pulls a blank tape to use that the random ID is assigned. The user has no control over random ID, nor can we tell the job to use any random ID tape in the Media Pool.
Interestingly, when we pull the almost full tapes out of the library and just leave in tapes that have some data, but not nearing capacity, in the Media Pool, all the jobs WILL complete with NO ISSUES – using non-blank tapes with different random IDs but in the same Media Pool. It is only when a job spans that it requires the same random ID. So if you’ve got a couple of almost full tapes in the front (first slots) of your Media Pool, and no blank tapes behind, the jobs will REQUIRE NEW BLANK TAPES even though you may have loads of free GB’s of room on tapes sitting in the library in the same Media Pool.
Below I’ve copied the email I sent to support regarding an issue with CA Arcserve Brightstor backup 11.5 sp2. According to support, neither sp3 nor the new version 12 will fix the issue we’ve run into with a bug in the software. Support will not send a request for change on to developers until he has “researched” the issue to see if anyone else has experienced the issue.
Because support was unable to figure out the issue at all (I figured it out and told them what the problem was and why it occurred) my guess is that even if anyone else has had the problem, they might not have known it was THIS ISSUE THAT CAUSED IT. Support would not have figured it out and identified it as a software problem since support told me that what happened on my jobs is just “how the software works” – the usual “by design” answer you get when there is a bug. Right now, the feedback I’m hearing is that it is not a significant enough issue to warrant developer time and budget to address.
So please help if you can.
mfreeman@anl.gov
HERE IS AN EXCERPT FROM ONE OF MY EMAILS TO CA SUPPORT THAT PINPOINTS THE ISSUE MORE:
Hello,
Wednesday night had the same issue with spanning, so most jobs failed. So yesterday, I pulled all tapes in the media pool with over 500gb data on them out of the library. So last night, all backups ran successfully again.
What is the point at which arcserve brightstor says a tape is full and should not be written to anymore?
Is it a set number out of 500gb, or a percentage, or does it compare the estimate of the job size with the space it "sees" as available on the tape, or what? How is that evaluated by the software to know when to stop using a particular tape within the media pool? This would help us to know if and when we would need to pull tapes to prevent this issue from happening again rather than reacting to failed jobs. It still requires some babysitting, but at least we have a workaround so that we don't waste so many tapes.
Had I known what the issue really was, I never would have put in the 16 blank tapes. Rather, I would have pulled the full tapes in the media pool so that the jobs would complete on the normal number of tapes - the incrementals usually hit 2 tapes per night. But because neither you nor I knew that the software was hitting an almost full tape and requiring a new blank tape to complete _each_ job that hit it, we ended up with 16 tapes with only a small amount of data on each for each of the jobs that spanned off of the almost full tape.
We wasted 16 tapes @ $135.00 each = $2,160.00. So rather than throwing away the $2k or losing the ability to recover data for that period, we now have to extend the time before the next full backup is run so that the tapes can be used efficiently. This is not an ideal way to operate our business.
Thanks for answering the above questions. It will help us to limp along with the software until a change can be made that will address the issue.