Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Network connection to Remote Agent is lost 1

Status
Not open for further replies.

dmain1970

MIS
Feb 6, 2004
10
GB
I am backing up 4 remote servers (2 Windows 2000 servers and 2 NT4 servers) from our Backup Exec 9.l media server overnight - when I check the job log the next morning, the backup has failed when backing up the last server on the list, our Windows 2000 Domain Controller, with the error "A timeout occurred waiting for completion of media server data processing", and the Job Completion Status is "Final error: 0xa00084f8 - The network connection to the Backup Exec Remote Agent has been lost. Please check for network errors." No verify is done because the backup fails at this point.

Why should Backup Exec would lose the network connection on this particular server (bearing in mind this server is at the same site as the media server) ?
 
This sounds like a connectivity issue rather than a backup exec issue.

You say you have 4 REMOTE servers. How remote are they? What is your connection to the 2k DC? How fast is the connection? Is it continuous?

I would also check the event logs on both server running Backup Exec and the DC itself. This will provide more information if there really is a connectivity error.

If there are any errors/warning in your event logs, let us know the Event IDs etc.. and we should be able to solve this one.





Andrew Ogden
Developement Engineer
Manchester, UK
 
There shouldn't be a problem with the network connection - all 100Meg local connections through switches from the media server to the W2K domain controller. Only 1 of the remote servers is at another site, over a 2Meg leased line and no problems backing that up.

Event ID 34113 in the Application Log on the media server, basically saying that the network connection to the Remote Agent has been lost.

No connectivity errors on the Domain Controller log files.
 
Got it working again - restarted the domain controller and the backup ran completely last night.

 
Good news!

It sounds like the backup exec agent on the DC needed restarting.



Lido
Development & Reporting
UK
 

Below are extracts of the comments that I sent to Veritas today on this error.

Has anyone got any other thoughts other than restarting the services ??

I hope that a restart of the service has solved your problem LidoDeJesolo ?? I found out otherwise.

I have found this error only to effect one server. Once or Twice every other month to start off - now its 2 - 4 times a week. I am moving the share points and altering the login scripts accordingly tomorrow.

Its causing alot of disruption. With the client box crashing when a backup is taken from it. Freezes can;t do anything with it !!

Any thoughts ?

p.s. I have read the other threads in the forums :)

P.


Hi,

I am receiving the following error in Backup Exec for Windows Servers Version 9.00 Rev. 4454

************************************************************************************************
Error category : Resource Errors
Error code : a00084f8 HEX
Error description : A timeout occurred waiting for data from the agent during operation shutdown.
************************************************************************************************

The backup Job is scheduled for 0500 in the morning. I have moved the backup Job to the last in a series of 16 Jobs (this Job used to be scheduled for 2130 but once killed the following eight jobs by crashing the backup exec engine on the server - I can't afford for this to happen so I moved it to last in the Q)

This is a usual output from the job :-

Backed up 14410 files in 1569 directories.
Processed 3,427,371,664 bytes in 10 minutes and 38 seconds.
Throughput rate: 307 MB/min


More frequently though I have been receiving the error stated above and receiving the following :-

Backed up 6733 files in 1147 directories.
Processed 1,474,702,320 bytes in 37 minutes and 7 seconds.
Throughput rate: 37.9 MB/min


As you can see the rate drops (nearly 10x) and it is not a full backup (only gets about half way through)

Looking at the performance logging that I did on the server being backed up the problem occurred at 05:06 with the last entry for Processor time; Network Load being at 02/18/2004 05:06:01 - the Server being backed up actually *FREEZES* at this point - THE ONLY WAY TO RESTART IS TO POWER DOWN

I have enabled both the client and the server in debug mode. The client end of things dies at a certain time into the job usually after about 6 mins or so (1.5 GB data transmitted) - as stated THE ONLY WAY TO RESTART IS TO POWER DOWN

Client Log
**********

successful job
--------------

a0c 2/17/2004 12:08:28: Allocated 10 buffers, size 32768 bytes, total used: 328520
a0c 2/17/2004 12:08:28: TF_OpenSet()
a0c 2/17/2004 12:08:28: SetupFormatEnv( fmt=0 )
a0c 2/17/2004 12:08:28: End od TF_OpenSet() ret_val = 0, num buffers = 10
a0c 2/17/2004 12:08:28: Informational: Local share path F: used to populate the System Protected File table
a0c 2/17/2004 12:19:02: TF xfer time = 633 seconds.
a0c 2/17/2004 12:19:02: WRITE: tpreceive_fail_count = 31554
a0c 2/17/2004 12:19:02: WRITE: waiting_on_buffers_count = 31545
a0c 2/17/2004 12:19:02: WRITE: buffers_written_count = 105046
a0c 2/17/2004 12:19:02: TF_CloseSet()
a0c 2/17/2004 12:19:02: FreeFormatEnv( cur_fmt=0 )
a0c 2/17/2004 12:19:02: Detach from \\server1\F$
a0c 2/17/2004 12:19:02: TF_FreeDriveContext( 2FDD48 )
a0c 2/17/2004 12:19:02: TF_FreeTapeBuffers: from 10 to 0 buffers
a0c 2/17/2004 12:19:02: Job Stop(0) - Tue Feb 17 12:19:02 2004

failed job
----------

a2c 2/18/2004 5:01:30: Allocated 10 buffers, size 32768 bytes, total used: 328520
a2c 2/18/2004 5:01:30: TF_OpenSet()
a2c 2/18/2004 5:01:30: SetupFormatEnv( fmt=0 )
a2c 2/18/2004 5:01:30: End od TF_OpenSet() ret_val = 0, num buffers = 10
a2c 2/18/2004 5:01:30: Informational: Local share path F: used to populate the System Protected File table

^^^ last entry


From the Server logs I obtain the following


Server Log
**********


successful job
--------------

9c8 17/02/2004 12:08:28: OpenListenSocket: Media server IP address: 678ce186
9c8 17/02/2004 12:08:28: OpenListenSocket: Media server port: 6507
9c8 17/02/2004 12:08:28:
dataStartBackup: ndmpSendRequest returned: 0x0, 0
9c8 17/02/2004 12:19:02: TF_NDMPGetResult(): MediaServer thread done, returning TFLE 0
9c8 17/02/2004 12:19:02: NDMPEngine::MessagePumpAndWaitForResults(): TF_NDMPGetResult() returned 0
9c8 17/02/2004 12:19:03: data halted: SUCCESSFUL
9c8 17/02/2004 12:19:03: NDMPEngine: Shutting down.
9c8 17/02/2004 12:19:05: WriteEndSet( 1 ) returning 0
9c8 17/02/2004 12:19:07: WriteEndSet( 1 ) returning 0
9c8 17/02/2004 12:19:07: WriteEndSet( 0 ) returning 0
9c8 17/02/2004 12:19:07: HARDWARE COMPRESSION ===> Setting compression off.
9c8 17/02/2004 12:19:08: TF_CloseSet
9c8 17/02/2004 12:19:45: RewindDrive mover ret = 0 (0x0)
9c8 17/02/2004 12:19:45: ret_val = 0
9c8 17/02/2004 12:19:45: TAPEALERT: Get TapeAlert Flags Return Code = 0X0
9c8 17/02/2004 12:19:45: TAPEALERT: TapeAlert Device Flag = 0X0
9c8 17/02/2004 12:19:45: TAPEALERT: TapeAlert Changer Flag = 0X0
9c8 17/02/2004 12:19:45: TF_FreeDriveContext( 1D74FC0 )
9c8 17/02/2004 12:19:45: TF_FreeTapeBuffers: from 2 to 0 buffers



failed job
----------

60c 18/02/2004 05:01:30: OpenListenSocket: Media server IP address: 678ce186
60c 18/02/2004 05:01:30: OpenListenSocket: Media server port: 2a0f
60c 18/02/2004 05:01:30:
dataStartBackup: ndmpSendRequest returned: 0x0, 0
60c 18/02/2004 05:06:26: ERROR: ndmpcSendRequest->connection error
60c 18/02/2004 05:06:26: ERROR: ndmpSendRequest failed:
60c 18/02/2004 05:06:26: NDMPEngine: NDMP control connection lost.
540 18/02/2004 05:15:04: DeviceManager: timeout event fired
540 18/02/2004 05:15:04: DeviceManager: processing pending requests
540 18/02/2004 05:15:04: DeviceManager: going to sleep for 900000 msecs
540 18/02/2004 05:30:04: DeviceManager: timeout event fired
540 18/02/2004 05:30:04: DeviceManager: processing pending requests
540 18/02/2004 05:30:04: DeviceManager: going to sleep for 900000 msecs
60c 18/02/2004 05:36:26: NDMPEngine::MessagePumpAndWaitForResults(): TF_NDMPGetResult() timer elapsed!
60c 18/02/2004 05:36:26: ERROR: ndmpcSendRequest->connection error
60c 18/02/2004 05:36:26: ERROR: ndmpSendRequest failed:
60c 18/02/2004 05:38:28: WriteEndSet( 1 ) returning 0
60c 18/02/2004 05:38:30: WriteEndSet( 1 ) returning 0
60c 18/02/2004 05:38:30: WriteEndSet( 0 ) returning 0
60c 18/02/2004 05:38:30: HARDWARE COMPRESSION ===> Setting compression off.
60c 18/02/2004 05:38:37: TF_CloseSet
60c 18/02/2004 05:39:04: RewindDrive mover ret = 0 (0x0)
60c 18/02/2004 05:39:04: ret_val = 0
60c 18/02/2004 05:39:04: TAPEALERT: Get TapeAlert Flags Return Code = 0X0
60c 18/02/2004 05:39:04: TAPEALERT: TapeAlert Device Flag = 0X0
60c 18/02/2004 05:39:04: TAPEALERT: TapeAlert Changer Flag = 0X0
60c 18/02/2004 05:39:04: TF_FreeDriveContext( 1D74FC0 )
60c 18/02/2004 05:39:04: TF_FreeTapeBuffers: from 2 to 0 buffers
60c 18/02/2004 05:39:04: FreeFormatEnv( cur_fmt=0 )
540 18/02/2004 05:45:04: DeviceManager: timeout event fired
540 18/02/2004 05:45:04: DeviceManager: processing pending requests
540 18/02/2004 05:45:04: DeviceManager: going to sleep for 900000 msecs
540 18/02/2004 06:00:04: DeviceManager: timeout event fired
540 18/02/2004 06:00:04: DeviceManager: processing pending requests
540 18/02/2004 06:00:04: DeviceManager: going to sleep for 900000 msecs
540 18/02/2004 06:15:04: DeviceManager: timeout event fired

--------------------------------------------------------------------------------------------------------

Job Log (Failed)

- <joblog>
<job_log_version version=&quot;1.0&quot; />
- <header>
<filler>======================================================================</filler>
<server>Job server: backupserver</server>
<name>Job name: 0500 server1 F$</name>
<start_time>Job started: 18 February 2004 at 05:00:04</start_time>
<type>Job type: Backup</type>
<log_name>Job Log: BEX02716.xml</log_name>
<filler>======================================================================</filler>
</header>
- <media_drive_and_media_info>
Drive and media information from media mount:
<drive_name>Drive Name: HP DAILY 80</drive_name>
<media_label>Media Label: W032_Tuesday</media_label>
<media_guid>Media GUID: {CB9C7E93-9BA3-46DA-ACEE-F4169180BAF4}</media_guid>
<media_overwrite_date>Overwrite Protected Until: 10/03/2004 04:09:25</media_overwrite_date>
<media_append_date>Appendable Until: 31/12/9999 00:00:00</media_append_date>
<media_set_target>Targeted Media Set Name: DLTWeekly</media_set_target>
</media_drive_and_media_info>
- <backup>
<filler>======================================================================</filler>
<title>Job Operation - Backup</title>
<append_or_overwrite>Media operation - append.</append_or_overwrite>
<compression>Hardware compression enabled.</compression>
<filler>======================================================================</filler>
<msgtitle_pre_jobstart>Starting Pre Job Command < net stop mcshield ></msgtitle_pre_jobstart>
- <set>
<set_resource_name>\\server1\F$</set_resource_name>
<tape_name>Family Name: &quot;Media created 17/02/2004 19:30:05&quot;</tape_name>
- <volume>
<display_volume>Backup of &quot;\\server1\F$ &quot;</display_volume>
</volume>
<description>Backup set #11 on storage media #1 Backup set description: &quot;0500 server1 F$&quot;</description>
<backup_type>Backup Type: COPY - Back Up Files</backup_type>
<start_time>Backup started on 18/02/2004 at 05:01:30.</start_time>
<info>Network control connection is established between backupserver:3879 <--> server1:10000</info>
<info>Network data connection is established between backupserver:3882 <--> server1:2516</info>
<end_time>Backup completed on 18/02/2004 at 05:38:37.</end_time>
- <summary>
<misc>Backed up 6733 files in 1147 directories.</misc>
<new_processed_bytes>Processed 1,474,702,320 bytes in 37 minutes and 7 seconds.</new_processed_bytes>
<vlm_hist_rateformat2>Throughput rate: 37.9 MB/min</vlm_hist_rateformat2>
</summary>
<filler>----------------------------------------------------------------------</filler>
</set>
</backup>
- <footer>
<filler>======================================================================</filler>
<end_time>Job ended: 18 February 2004 at 09:08:50</end_time>
<engine_completion_status>Job completion status: Failed</engine_completion_status>
<filler>======================================================================</filler>
<completeStatus>6</completeStatus>
<errorCode>Final error code: a00084f8 HEX</errorCode>
<errorDescription>Final error description: A timeout occurred waiting for data from the agent during operation shutdown.</errorDescription>
<errorCategory>Final error category: Resource Errors</errorCategory>
</footer>
</joblog>

--------------------------------------------------------------------------------------------------------

So far :-

I have followed numerous forum posts and have changed some registry settings to increase the timeout period for communication between the agent and the server - no change still get error

Settings were as follows :-

------

Symptom:
The error: &quot;A timeout occurred waiting for data from the agent during operation shutdown&quot; is returned when performing a backup operation with Backup Exec 9.0 for Windows Servers.
Exact Error Message:

a00084f8 HEX - A timeout occurred waiting for data from the agent during operation shutdown.

Solution:
This issue occurs when the timeout period expires for the Remote Agent for Windows Servers (RAWS).

To correct this problem, increase the timeout periods as follows:

1. Open regedit or regedt32 on the Backup Exec media server.
2. Increase the value of the following keys:
Set the registry value HKEY_LOCAL_MACHINE/Software/VERITAS/Backup Exec/Agent Browser/TCPIP/Expire Time to 1200 (Decimal)
Set the registry value HKEY_LOCAL_MACHINE/Software/VERITAS/Backup Exec/Engine/Agents/Data Connection Flush Timeout Seconds to 1800 (Decimal)
Set the registry value HKEY_LOCAL_MACHINE/Software/VERITAS/Backup Exec/Engine/Agents/NDMP Connect Open Time Out Seconds to 300 (Decimal)
Set the registry value HKEY_LOCAL_MACHINE/Software/VERITAS/Backup Exec/Engine/Agents/Notify Data Halted Time Out Seconds to 300 (Decimal)
Set the registry value HKEY_LOCAL_MACHINE/Software/VERITAS/Backup Exec/Network/TCPIP/Disconnect Delay to 1500 (Decimal)
Set the registry value HKEY_LOCAL_MACHINE/Software/VERITAS/Backup Exec/Network/TCPIP/WorkBufferSize to 32768 (Decimal)
Set the registry value HKEY_LOCAL_MACHINE/Software/VERITAS/Backup Exec/Engine/NTFS/Restrict Anonymous Support to 1. Create the value if necessary.
3. Stop all Backup Exec Services
4. Start up Backup Exec Services

------

Moved the server onto the local subnet to the backup server. This is too rule out switching problems. We backup seven servers in the same subnet without a problem.

Ruled out any potential problems with running Anti Virus alongside backup.

Ruled out media problems

Ruled out any potential ports being open at the same time (have not done this practically just from the logs)
 
Hi all,

Anyone hot any thoughts on the above ? and help / advice would be great :)

LidoDeJesolo, has the error reoccurred yet ?

Cheers,

P.
 
I would try upgrading to Backup Exec 9.1 - this may solve your problems.
 
Apologies pandazoo

It was dmain1970 who had the problem. I was just suggesting what the problem may have been.

I agree with dmain1970 I think there was a patch included for your problem in the Backup Exec update.

Lido
Development & Reporting
UK
 
Many thanks Guys.

I will obtain the patch.

Would you happen to have a url dmain1970 for the patch info on this error.

Cheers P.
 
Hi Dmain1970!

I am experiencing exactly the same issue as you. Did you manage to solve it? Are you still experiencing it?
Would appreciate very much your feedback.

Thanks
Pierre


 
I still get this error occasionally, and always when backing up the same server. Haven't managed to solve it, but it's only our Domain Controller, so there's no data on it.
 
Having the same problem too, and that same server has been very problematic. The Remote Agent keeps crashing on it too, like once a day. The server is not a domain controller and has been restarted countless times. But here is the thing, I run a Full backup on it, then a Copy backup, and the Full went just fine, during the copy backup, it failed and gave me that error.

I am using v 9.1.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top