Arcserve r11.5 SP3 job going on Hold...

ITNV47 · Aug 24, 2007

Microsoft Windows Server 2003 R2
Standard Edition
Service Pack 2

Tape Backup Device = HP DLT VS80

Brightstor ARCserve Backup r11.5 SP3 (Build 4405)

The Symptom showed up after the install of Windows 2003 uprgrade to R2 on the server, the daily tape backup job will run at the set time but after about 5 minutes goes to HOLD status. (Same result even if using a manual set job) In the activity log it just makes the entry about "Status changed from Active to Hold" with no other error reference. The entry in the log just before the Active to Hold line after about 3 1/2 minutes, makes reference to the Backup Session on Media <the name>. All previous log entries look normal.

I tried reinitializing the databases several times and that didn't help, then I uninstalled Arcserve, cleared the database folder and then rebooted and reinstalled. Everything looks normal as the job goes to active status and begins the backup, but then it goes on Hold after about 5 minutes passes The DB Pruning job runs normally everyday without an error.

The tape device appears to be working fine, it will format and eject a tape and display tape info.
I also have this running on a Windows Server 2000 with the same exact symptom. Plus build 4402 had the same symptom so I tried 4405.

Any insight will be greatly appreciated.......

davidmichel · Aug 27, 2007

don't know.

Put the Job and Tape Engines into debug and perhaps the logs will give a clue.
ck
BrightStor.log
Tape.log
JobQueue.log

ITNV47 · Aug 27, 2007

Test Job started at 08/27/2007 - 14:30:00......appears to run normal until it appears on HOLD after about 5 minutes... I have taken sections from each log and placed on this post. I abbrevated the tape log down to a section that I suspect might hold the key to the problem.

The "[0980] DaemonSuspend: Stuck in CountOneQueue(FULL)" line appears many times repeating over and over followed by a periodic Write line just like the one listed below. I truncated the previous entries and chose to show just the last entry for that line. I am researching now to see if this is some kind of memory error...

BrightStor.log

[08/27/2007-14:32:13 ,1,14,9,0,-1,2,18,0,0] ASDB_BeginSESSION is called(Media=11/01/06 12:36 PM, ID=61413, Session=9, TapeSeq=1)
[08/27/2007-14:32:13 ,1,14,9,0,-1,2,18,0,0] Insert UUID to the database.(Media=11/01/06 12:36 PM, ID=61413, Session=9, TapeSeq=1, EC=0)
[08/27/2007-14:32:14 ,1,14,9,0,-1,2,18,0,0] Using asynchronous I/O. [8, 524288]
[08/27/2007-14:35:10 ,1,0,0,0,-1,2,3,0,0] [JOBQUEUE]: Status changed from Active to Hold [Job No: 2] [Description: ]

JobQueue.log

08/27/07 14:35:10 [jobeng[3856][5788]: QSI.C[4635]] (CRITICAL): _QSIMaintainQueue_Ex(): Change the Job status to HOLD and last result to CRASHED for Job=2, taskID: 304
08/27/07 14:35:10 [jobeng[3856][5788]: QSI.C[4646]] (CRITICAL): _QSIMaintainQueue_Ex(): __QSIModifyJob(): Called for Job#: 2, taskID: 0, ret: 0
08/27/07 14:35:10 [jobeng[3856][5788]: JOBENG.C[3098]] (ERROR): JobEngineThread:: Found crashed jobs.
08/27/07 14:35:10 [jobeng[3856][5788]: PurgeCopyMain.c[2587]] (ERROR): UpdateDBStagingRecordsForCrashedDMJobs:: Trying to load jobid = ulJobID.

Tape.log

"........multiple occurences of the following line...truncated for this posting.........................."
2007/08/27 14:35:15 [0980] DaemonSuspend: Stuck in CountOneQueue(FULL)
2007/08/27 14:35:15 [0860] =>ABSL:3060 [WRITE ] 0a 01 00 00 01 00 00 00 00 00 tt[20]
2007/08/27 14:35:15 [0980] DaemonSuspend: Stuck in CountOneQueue(FREE)
2007/08/27 14:35:15 [0980] DaemonSuspend: Stuck in CountOneQueue(FREE)
2007/08/27 14:35:15 [0980] SemC Semaphore NOT Released [ 33c] for [298]
2007/08/27 14:35:15 [0980] SemC Semaphore NOT Released [ 35c] for [298]
2007/08/27 14:35:15 [0980] SemC Semaphore NOT Released [ 360] for [298]
2007/08/27 14:35:15 [0980] SemC Semaphore NOT Released [ 314] for [298]
2007/08/27 14:35:15 [0980] SemC Semaphore NOT Released [ 328] for [298]
2007/08/27 14:35:15 [0980] Call FreeSharedMemory(jobid=14,dcb=4,sesid=0)...
2007/08/27 14:35:15 [0980] FreeSharedMemory(): Release share memory successfully
2007/08/27 14:35:15 [0980] ClientEndSession: LDN:4 Verify Block Positon is not at ZERO.
2007/08/27 14:35:15 [0980] =>ABSL:3060 [READ POSITION ] 34 00 00 00 00 00 00 00 00 00 tt[6]
2007/08/27 14:35:15 [0980] DRV:4 Current block #: [000688d6] [428246]
2007/08/27 14:35:15 [0980] =>ABSL:3060 [WRITE FILE MARK ] 10 00 00 00 01 00 00 00 00 00 tt[20]
2007/08/27 14:35:17 [0f78] Attempting to Lock device [4] timeout=[0]
2007/08/27 14:35:17 [0f78] SemC request Semaphore Error [ 21c] for[0], Timeout [0]
2007/08/27 14:35:18 [0980] GetGCBHandleFromGroupName(): GroupName [GROUP0]
2007/08/27 14:35:18 [0980] ...in DisConnectFromTape for JobID[14]
2007/08/27 14:35:18 [0980] DRV:[4] DisConnectFromTape leaveTapeAtEnd[0]
2007/08/27 14:35:18 [0980] DRV:4 [HP DLT VS80 583B] Log Sense Not Supported.
2007/08/27 14:35:18 [0980] The current drive usage time is 00:05:10
2007/08/27 14:35:18 [0980] Drive[4] Usage Time for this job 00000310
2007/08/27 14:35:18 [0980] DB: DBUpdateTape: Tapename[11/01/06 12:36 PM]
2007/08/27 14:35:18 [0980] DB: Updating tape info in database.
"..................remaining entries appeared to be normal.........truncated from this posting.............."

davidmichel · Aug 28, 2007

I'm not sure, but...

Perhaps the Tape Engine is having a problem getting exclusive use of the drive.
Check the OS Device Manager to see if the tape drive is enabled or disabled. If it is enabled, disable it, restart, and try the job again.

Based off the job queue log there might be a problem in/with the queue.
do cstop & close the Manager
rename the queue 00000001.qsd
do cstart (the queue is automatically recreated)
create a new job and test it

Run a Count or Copy job.
Since they do not use the Tape Engine it might help identify the source of the problem.

ITNV47 · Sep 10, 2007

Job going on Hold Fix:

Checked the exception log below

CA\Brightstore ARCserve Backup\Log\tskept04.log

Task Engine Exception Handler Ver 1.0
Exception Time : Mon Sep 10 12:47:54 2007
Exception Cause : EXCEPTION_ACCESS_VIOLATION (attempting to read)
Exception Flags : Noncontinuable exception
Exception Address : 01f91138h
DLL/Exe that caused GP: C:\Program Files\CA\SharedComponents\BrightStor\asrbckup.dll

***

Ran cstop.bat to stop all services...

Then renamed the CA\SharedComponents\Brightstor\asrbckup.dll to asrbckup.dll.old

Ran cstart.bat to restart all services...

Ran the tape job and successfully completed the tape run!!!

Thank you for your input... davidmichel

davidmichel · Sep 10, 2007

glad I was of some help

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Arcserve r11.5 SP3 job going on Hold...

ITNV47

Technical User

davidmichel

Technical User

ITNV47

Technical User

davidmichel

Technical User

ITNV47

Technical User

davidmichel

Technical User

Similar threads

Part and Inventory Search

Sponsor