Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Job crashes wihtout error on Arcserve 2000

Status
Not open for further replies.

maartenn

IS-IT--Management
Jun 3, 2003
10
BE
Hi,

If we have many sites with all sepperate backupservers and clients. Each site has its own backupserver that backs up his own and 1 tot 10 clients.
I often get a crash on a job and nothing is in the job-log, databaselog or tape-log the only re-ocuring message is :
Replicated From : ttings\<username>\Local Settings\Application Data\Microsoft on Server
Often the line is truncated at the username or at the end.

It seems that the job crashes just after starting a next sessions (usealy a new drive)

Thanks in advance,

Maarten Nevelsteen


A reboot of the server often solves the problem. Or when the total job is split up in smaller jobs each containing 1 logical drive.
Has someone any idea's on how to solve this problem ?
 
Put the Job Engine in debug and then check the activity log.

Please explain this a little more.
&quot;It seems that the job crashes just after starting a next sessions (usealy a new drive)&quot;

I don't understand what you mean by a new drive? or are you talking about when the job spans to a new tape?

What type of tape device is being used?
 
The backup is done to HP ultrium 1 drives.
So the backup seem to crash just after starting a new session within the complete Backup Job. ex: when starting with the backup of drive D:(session #2) after drive C: (session #1) has been backed up.
Sometimes the following work arroun works:
1)create a sepparate job for each sessions, so that each job only contains 1 session. (If the server has 3 logical disks [C:, D: and E:] than I create 4 jobs [1 extra for System state]) sometimes solves the problem. But this
workarround is not workable when I have to backup 10 servers daily.

2) If that does not work another work is to create the job and use the option 'use Lock Mode if deny writes failes'.
But also this workarround does not always work. (and cannot be used during business hours)

And de last thing I try then is a reboot, which solves the problem for a few days.
 
Try to:
1-Stop Job Engine
2-rename job queue folder 000000001.qsd
3-start job engine
4-create a new jobs and see if that solves your problem


 
I have recreated the jobs, but still the same problems.
What I find very strange is the message :
&quot;Replicated From: ttings\administrator\Local Settings\Application Data\Microsoft\Windows on Server
&quot;
This rule keeps comming backu in the activity/job log of crashed job. And I can't find this message on servers that are running ok. Does anyone now what this means, and why the message is truncated ?

I'll send a small extract from the different logs from the timeframe time such a job crashes.


-Activity log extract:

[ASDB] Initiate to Begin to Backup Database.
20040211 200415 [ASDB] Begin to Backup Database.
20040211 200415 66 Backup ARCserve Database...
20040211 200415 66 The volume migration status = 0. (0=Off, 1=On)
20040211 200415 66 disk C: is not MSCS shared disk.
20040211 200415 66 Profile Image Path = C:\Documents and Settings\arcserveservice
20040211 200415 66 Profile Image Path = C:\Documents and Settings\arcserveservice
20040211 200416 66 Creating temp file BCK*.*
20040211 200416 66 Creating temp file TRK*.*
20040211 200416 66 point select:1, filter: disk 0 node:1
20040211 200416 66 Source Directory: C:\Program Files\ComputerAssociates\ARCserve\DATABASE
20040211 200416 66 Replicated From: ttings\administrator\Local Settings\Application Data\Microsoft\Windows on Server
20040211 200416 66 Back up Session 3 on Media WE2
20040211 200416 66 Session flag: 7c04
20040211 200416 66 ----------------------------------------------------------------
20040211 200416 66 Writing BackupExtendedSessionHeader...
20040211 200416 66 Wrote file header for extended session header. Stream data follows.
20040211 200416 66 Writing drive capacity stream to tape...
20040211 200416 66 Wrote stream header for drive capacity
20040211 200416 66 Wrote drive capacity data to stream.
20040211 200416 66 Wrote padding to stream.
20040211 200416 66 Done. Wrote drive capacity stream to tape.
20040211 200416 66 Closing extended session header, no errors.
20040211 200416 66 Wrote stream trailer for extended session header. Stream closed.
20040211 200416 66 Wrote file trailer for extended session header.
20040211 200416 66 ----------------------------------------------------------------
20040211 200416 66 Creating temp file C*.*
20040211 200501 [ASDBAPI] Current thread user:arcserveservice.

Tapelog extract :
[3700] LDN:1 At Start of New Session: 3
[3700] 02/11 20:04:16 ABSL:2030 [READ POSITION ] 34 00 00 00 00 00 00 00 00 00
[3700] Current block #: 00001a81
[4212] Total Physical Memory 804802560
[4212] Avail Physical Memory 154693632
[4212] Avail Page File 933974016
[4212] Total Avail memory 1088667648
[4212] Total Buffers returned 3776
[4212] LDN:1 Daemon chuncks to use:3776
[4212] LDN:1 ReadShots:8
[4212] LDN:1 WriteShots:4
[4212] Start WriteDeamon TID:3720
[3700] Connection terminated for job 00476a5c
[3700] Ending Session in DestroyJobHandle, Job [00476A5C]
[3700] LDN:1 LogSense called for TapeAlert
[3700] 02/11 20:05:01 ABSL:2030 [LOG SENSE ] 4d 00 2e 00 00 00 00 01 44 00
[3700] Calling DisConnectFromTape()
[3700] LDN:1 DisConnectFromTape
[3700] 02/11 20:05:01 ABSL:2030 [LOG SENSE ] 4d 00 42 00 00 00 00 00 40 00
[3700] Getting Log Sense Recovered write errors.
[3700] LDN:1 HP Ultrium 1-SCSI E2BD recovered 0 error for writing..
[3700] [ 3700 ]Updating tapeinfo in database 1360923078
[3700] [ 3700 ]Finished Updating tapeinfo in database 1360936359
[3700] [ 3700 ]Registering tape drive usage time in database 1360936375
[3700] [ 3700 ]Finished Registering tape drive usage time in database 1360936843
[3700] 02/11 20:05:14 ABSL:2030 [REWIND ] 01 00 00 00 00 00 00 00 40 00
[3700] Calling DisConnectFromGroup:GROUP0
[3700] Calling DisConnectFromTape()
[3700] UnLockGroupEnum:GROUP0 NOT Clearing Sem! Already Done!
[3700] UnLocking Group [GROUP0]
[3700] DestroyHandle [00476A5C]: Active Jobs 1
[3492] 02/11 20:05:23 ABSL:2030 [TEST UNIT READY ] 00 00 00 00 00 00 00 00 40 00
[3492] 02/11 20:05:29 ABSL:2030 [TEST UNIT READY ] 00 00 00 00 00 00 00 00 40 00
[3492] 02/11 20:05:35 ABSL:2030 [TEST UNIT READY ] 00 00 00 00 00 00 00 00 40 00
[3492] 02/11 20:05:41 ABSL:2030 [TEST UNIT READY ] 00 00 00 00 00 00 00 00 40 00
[3492] LDN:1 LogSense called for TapeAlert

Job log extract :
Job ID........................ 66
Workstation................... AU002S020
Source........................ C:\Program Files\ComputerAssociates\ARCserve\DATABASE
Replicated From............... ttings\administrator\Local Settings\Application Data\Microsoft\Windows
On Server.....................
Target........................ WE2, ID FD90, Sequence #1
Session....................... 3
Start Time.................... 11/02/04 8:04 PM
 
SP5 is not installed, because we have a large backup organisation (44 backupserver over 44 sites in 20 countries). And a role out of SP5 is not yet planned.
(I know i should but i cannot find any patch within SP5 that adressess my problem directly) I'm very curious what that message &quot;&quot;Replicated From: ttings\administrator\Local Settings\Application Data\Microsoft\Windows on Server&quot;
means and why it is truncated. Because if I search for info on it, I can't find it anywhere.
 
Not sure but it might help to know during which session it happens. For instance does it happen during the System State session.
 
The example logs are from a session where it try's to backup the arcserver database. But I also had the fenomenon when backing up the C-drive, D-drive or System State.
No one that nows someting about the &quot;&quot;Replicated From: ttings\administrator\Local Settings\Application Data\Microsoft\Windows on Server&quot; ?

Thanks in advance,
 
Well for what it is worth there are no errors in the tape log.

However this is strange:
Connection terminated for job 00476a5c

Something came along and broke the connection between the Tape Engine and the drive, or at least that is what it looks like from this.

Make sure the OS tape device driver is not running. It's done via the OS Device Manager and disabling the tape drive and then restarting the system.
Next disable the Removable Storage service.
Next check Control Panel for HP or Compaq Management Agents. I think that is what it is called. Go to properties and then select SCSI Information and then click remove.

Hopefully one of those will clear this up for you.
 
We were not able to solve this problem.
The problem went away when we upgraded to BEB 10.5 SP1. Almost no database problems sins then.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top