Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

9.1 Agent / Server Problems.....

Status
Not open for further replies.

pandazoo

Technical User
Jun 11, 2003
37
GB
Hi,

Anyone got any ideas on the following ... I'm running out :(

Veritas Media Server 9.1 SP1 - Agent v9.1
Media Server Win Server 2003 (backup to disk)
Agent on Server 2000

Sometimes the backup job will run but most of the time the agent will just crash on the client and the backup job will fail with the following error :-


-----
Failed
Final Error: 0xa000fe30 - A communications failure has occured

The connection to the target system has been lost. Backup Set cancelled.
A communications failure occured on the backup device.
-----

There are NO firewall rules effecting either system at present. (The backup used to work fine - however this was to tape and under Veritas 9)


Things I have done / troubleshooted

1. Telnet to agent on port 10000 - ok
2. Checked and rechecked the login accounts - ok
3. Checked share access - ok
4. Disabled Second Network adapter on server being backed up
5. Checked switch to see if any packets are being dropped - none
6. Searched Veritas's site - only one articles relating towards this error indicate problems with exchange (exchange is not installed on this server it is purely a file server)
7. Checked all media is accessible etc.
8. Set new share paths
9. Used Veritas troubleshooting tools (TCP/IP) - ok

Below are some debug logs (looks like a communications error)

-----------------------------------------------------------
Server

ca8 30/09/2004 15:35:42: ENGSCRPT: IN - oldDeviceName = \\*\*
ca8 30/09/2004 15:35:42: ENGSCRPT: OUT - newName = \\*\*
ca8 30/09/2004 15:35:42: NDMP version 3 connection CONNECTED
ca8 30/09/2004 15:35:45: ERROR: ndmpcSendRequest->connection error
ca8 30/09/2004 15:35:45: ERROR: ndmpSendRequest failed:
ca8 30/09/2004 15:40:46: ERROR: ndmpcSendRequest(): timeout waiting for reply from control connection
ca8 30/09/2004 15:40:46: ERROR: ndmpSendRequest failed:
ca8 30/09/2004 15:40:46: SetupNDMPConnection: connectOpen failed on server VK-KARNAK.
a64 30/09/2004 15:43:44: Shutting down RPC listener
a64 30/09/2004 15:43:44: Cancelling any pending device requests
a64 30/09/2004 15:43:44: Shutting down JobEngine

-----------------------------------------------------------

Agent

c94 9/30/2004 15:35:41: Successfully impersonated vk-karnak\ramesis
c94 9/30/2004 15:35:41: WhoAmI( ) reports: \



--->> here the agent dies on the client being backed up. It has to be restarted manually.

The only thing that I can think of doing is changing the port range of the agents to a high band - just incase this is the problem.

If anyone has any other ideas / troubleshooting which I have missed that would be great...

P>


 
There is more informaton on the remote agent debug. Look for NDMPconnect or seach the remote debug for "port" If the remote agent service is crashing then thats another problem. When your media server hits that server for backup is it killing the remote agent server? Whats running on the remote mahcine SQL ? Do you use IDR?

First thing I would do is setup the port range from 30000-40000 in Backup Exec and run a backup. Run the backup during the intended hours. There may be something going on in the network at nite thatc causint the problem. Looks like you have done everything else for troubleshootng step.
The above is only if you get that error and the remote agent service is started and it not stopping.

If the backup is stopping the remote agent server then thats another issue. Something your backin up on the machine is probably killing the remote agent. Look for a drwtsn32.log on the server. The data of the file should basically tell you of the remote agent is creating a Dr watson log. There are two hotfilx for 9.1 for Exchange and IDR that invole the remote agent stopping. HOTFIX 33 and 27
 
Steve,

Many thanks - I will investigate further :)

Andy.
 
Hi,

The problem is back - this time on a different Server.

Hotfixes 33 and 27 do not apply - the server is just a file server with No Exchange or IDR.

The problems started on Friday and happened three times over the weekend (I restarted the remote agent remotely on each occasion - it failed with the same error).

DrWatson logs on the server indicate the remote agent failing with the following :-

The application, , generated an application error The error occurred on 10/10/2004 @ 16:53:54.102 The exception generated was c0000005 at address 00D52498 (xdr_string)


More detailed part's of the Dr.Watson log are :-

----------------------------------------------------------

Application exception occurred:
App: (pid=376)
When: 10/10/2004 @ 16:53:54.102
Exception number: c0000005 (access violation)

----------------------------------------------------------

*----> Stack Back Trace <----*

FramePtr ReturnAd Param#1 Param#2 Param#3 Param#4 Function Name
01AFFF2C 7C573B50 00000228 FFFFFFFF 00000000 0143B580 ntdll!NtWaitForSingleObject
01AFFFB4 7C57438B 01627EA0 00000003 00000000 01627EA0 kernel32!WaitForSingleObject
01AFFFEC 00000000 00000000 00000000 00000000 00000000 kernel32!TlsSetValue

State Dump for Thread Id 0x974

eax=00000000 ebx=01631920 ecx=ffffffff edx=05380000 esi=00000001 edi=00000001
eip=00d52498 esp=0156fb6c ebp=01620dd4 iopl=0 nv up ei pl zr na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246


function: xdr_string
00d52484 5d pop ebp
00d52485 b801000000 mov eax,0x1
00d5248a 5b pop ebx
00d5248b c3 ret
00d5248c 3bf1 cmp esi,ecx
00d5248e 7414 jz xdr_ndmp_log_file_request+0x54 (00d57ba4)
00d52490 57 push edi
00d52491 8bfe mov edi,esi
00d52493 83c9ff or ecx,0xff
00d52496 33c0 xor eax,eax
FAULT ->00d52498 f2ae repne scasb es:00000001=??
00d5249a f7d1 not ecx
00d5249c 49 dec ecx
00d5249d 5f pop edi
00d5249e 894c2414 mov [esp+0x14],ecx ss:01d89a53=????????
00d524a2 eb04 jmp NdmpPvalSet::eek:perator[]+0xbb8 (00d5ada8)
00d524a4 894c2414 mov [esp+0x14],ecx ss:01d89a53=????????
00d524a8 8d442414 lea eax,[esp+0x14] ss:01d89a53=????????
00d524ac 50 push eax
00d524ad 55 push ebp
00d524ae e8ad760000 call xdr_ndmp_usn_set_info_reply (00d59b60)
00d524b3 83c408 add esp,0x8


----------------------------------------------------------

If anyone can offer any advice that would be great.

Many thanks,

P.


 
at what point of the backup of the remote server does this happen?
Can you try and find a pattern to this? for eg. does it happen on the same directory.

What type of backup are you doing ie. Full backup reset archive bit, or Full Backup, using modified time?

Do you have Progree Indicators turned on - Tools, Options, Preferences. if you do, turn them off.
 
Hi,

Thanks for the replies.

I have uninstalled version 9.1 and installed version 9.0 4454.

It took a day but from my experience so far has been worth it.

Backup to Disk went through at 100% last night and the file is compressing onto Disk at 2:1 (it was not compressing properly under v 9.1)

In my experience I found 9.1 very unstable with alot of "one-off" problems that occured for no particular reason.

Many thanks to all that tried to help along the way :)

P. (a happy Veritas 9.0 user (for now .... !!!))



 
The problem might be that the files on the remote server have unicode characters in the file name. I have a server that my mac users store files on. The remote agent would crash when it encountered a macintosh file name like "***IMPORTANT WORK\FOLDER". I have resorted to stuffing the files before backup and have had much success. Stuffit will also not handle unicode characters - I simply rename the directory.

I did evaluate UNIBAC with success. Although the remote agent is three times slower on backup.
 
Hi from the dark place where Veritas 9.1 is running. We have Windows 2003 and Veritas Backup Exec 9.1, Domino Lotus 6.52, 2- Dell PowerVault 120T tape libraries, Perc2 controllers on one server connecting to external Dell PowerVault 210S disk drives. We can only get a backup to work if we reboot system and start it manually. We are trying to use the Lotus Database Option. Jobs go into pre-procesing for ever.
1350 11/10/2004 12:49:57: BackupJob: medium successfully mounted
1350 11/10/2004 12:49:57: Media Label: DLT000042
1350 11/10/2004 12:49:57: Media GUID: {78CBEEEC-5784-4152-AA2F-4C5907BB4339}
1350 11/10/2004 12:49:57: Overwrite Protected Until: 12/31/9999 12:00:00 AM
1350 11/10/2004 12:49:57: Appendable Until: 12/31/9999 12:00:00 AM
1350 11/10/2004 12:49:57: TAPEALERT: Get TapeAlert Flags Return Code = 0X0
1350 11/10/2004 12:49:57: TAPEALERT: TapeAlert Device Flag = 0X0
1350 11/10/2004 12:49:57: TAPEALERT: TapeAlert Changer Flag = 0X0
1350 11/10/2004 12:49:57: TAPEALERT: Get TapeAlert Flags Return Code = 0X0
1350 11/10/2004 12:49:57: TAPEALERT: Get TapeAlert Flags Return Code = 0X0
1350 11/10/2004 12:49:57: ERROR: ndmpcSendRequest->connection error
1350 11/10/2004 12:49:57: ERROR: ndmpSendRequest failed:
1350 11/10/2004 12:49:57: SetupNDMPConnection: connectOpen failed on server AOMAIL02B.
1350 11/10/2004 12:49:57: TAPEALERT: Get TapeAlert Flags Return Code = 0X0
1350 11/10/2004 12:49:57: TAPEALERT: TapeAlert Device Flag = 0X0
1350 11/10/2004 12:49:57: TAPEALERT: TapeAlert Changer Flag = 0X0
1350 11/10/2004 12:49:57: TAPEALERT: Get TapeAlert Flags Return Code = 0X0
1350 11/10/2004 12:49:57: TAPEALERT: Get TapeAlert Flags Return Code = 0X0
aac 11/10/2004 12:49:57: DeviceManager: incoming event fired
aac 11/10/2004 12:49:57: DeviceManager: processing pending requests
aac 11/10/2004 12:49:57: DeviceManager: going to sleep for 61000 msecs
1350 11/10/2004 12:49:57: Job thread terminating
aac 11/10/2004 12:50:58: DeviceManager: timeout event fired

Any ideas?
 
Hi,

I'm not too sure about your problem.

All I can say is since my post over a month ago I have been running Veritas 9.0 Rev 4454 with no problems over two Veritas Backup Exec Servers.

Andy.
 
Double Check with Dell, but I don't think the tape device should be hung on a Perc (RAID) controller.

JB
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top