Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Veritas 9 - A communications failure between BE9 & Remote Agents 1

Status
Not open for further replies.

Zebed00

Technical User
Apr 8, 2002
6
0
0
GB
Hi,

I'm wondering if anyone can help on this one please. We've recently installed Veritas Backup Exec 9.0 in a mixed Windows NT/2000 Domain. We are receiving the following failures randomly on a nightly basis.

Completed status: Failed
Final error code: a00084f9 HEX
Final error description: A communications failure has occurred between the Backup Exec job engine and the remote agent.

Final error category: Resource Errors

We've made sure that all the NIC's are set to 100 Mbits and Full Duplex (as recommended by Veritas' TID) and we've ensured that NIC Teaming is switched off. The weirdestpart is that sometimes it happens on one of the jobs and other times is completely different backup set.

Please, if any one has received this error in the past and found a solution, would they please let me know.

Thanks very much in advance,

:)
 
I too am getting the same error message (a00084f8 HEX) while attempting to backup my remote servers. They say backing up remote servers over the WAN is not a good idea but I have no choice. One of our remote sites is a 100mb OC3 fiber MAN connection. I don't think this should be an issue. I have done the registry changes, run the beengine and remote piece in debug mode. The debug mode locks up and never aborts. I am in the process of running a utility called TDImon which captures packets for displaying and anaylizing at the request of Veritas support. Doing this took so much control and resource that nobody could logon at the remote site. Wits end is comming soon.
Good luck to all and if I get it resolved I will certainly share it with you.
 
Hi all. Just an update on our remote agent problems...

Our problem was not with the dual NIC teaming or the switch port settings as suggested to me by Veritas - we needed to make switch ACL changes to allow communication in both directions across our VLANS - when I did the install and started testing, I could communicate with the remote server to do the agent install upstream but all jobs would fail with the comm error when the agent tried to send the data downstream. In order to keep the port range fairly granular rather than open it up totally, and even though we weren't dealing with a firewall in this case, we opted to try the recommended 'Firewall' port settings from the Admin Guide - 50 ports for the media server and 25 for the remote agent. Now the ACL changes are in, I have the NIC teaming reinstalled and the backups are working perfectly. Both the media server NICs and the remote server running the agents are dual NIC'd, running them in 'failover' mode on both ends. We left the switch port settings as 100/Full Duplex however.

One thing to watch out for is the account that you use for remote backups - make sure you use 'system account' for the remote agent service login info - I made the mistake of using a service account with Administrator privs as the account for the remote backup - the system account is a local administrator of the remote server so therefore can access any file on the remote box - a service account that you set up may not have the desired permissions on the files that you want to back up - files not backed up because the remote agent can't access them comes back as an error too.
 
I finally got it working for my remote sites. I went through the registry changes, assigning it to a specific NIC card, setting the switches routers and NIC cards to 100 full duplex, on and on and on..... What I found was a router that was dropping packets due to not having enough ram for buffering. When the buffer filled it would simply drop packets. This is OK for most IP traffic but with BE 9.0 they use the Network Data Management Protocol, which is very unforgiving when it comes to dropped packets. So if you are having the communications error while trying to back up remote sites, do some really heavy analyzing of your equipment in route to those sites. The performance is well worth the effort when comming from BE 7.3
 
I had a customer that recently installed Veritas Backup Exec 9.0. Since that time, they would routinely get an error backup one of their Windows 2000 servers. The error would remain the same: , but it would happen on different servers on a nightly basis. After fighting with Veritas a good length of time, they finally admitted that there was a communication issues with their product. It seems that the Media Server would attempt the talk to the Remote Agents on a Port that it already had open but had not closed. To resolve this issue they had me make the following changes on the Media Server.

Bring up the Backup Exec console. Once this is up select the following: Tools -> Options. In the Options window, select Network. Check the boxes associated with Enable Media Server TCP Dynamic Range and Enable Remote Agent TCP Dynamic Range. Also modify the ports used for each of these. Under Enable Media Server TCP Dynamic Range enter 12000 - 15000. Under Enable Remote Agent TCP Dynamic Range use 16000 - 19000. These ranges may need to increased if you are backing up more than one client at a time.

 
I was having the same issues as everyone else here. I finally was able to troubleshoot to a single server. After weeks and weeks of recomendations from Veritas. After not being able how to back up this one single server I started stripping out anything that could possibly "confuse" BE 9.0

Network Load Balancing seemed to be the problem. I have not found why as of yet, but I will and post again.
 
I received the TFLE_PROGRAMMER_ERROR1 error on a random basis. This is what I did: applied the Veritas hotfix2 (9.0.4454), the latest device driver, made sure the firmware was up to date (quantum sdlt320) and ran the Quantum wizard, made sure the SCSI driver was up to date. That ruled out all possibilities for error except for the media itself. However, I continued to have random errors; sometimes the media would work and sometimes it wouldn't. So, I backed up a scheduled job on a new tape. I ran the SAME scheduled backup job on the same tape - received the error. I erased the tape and ran the SAME scheduled job again and it worked. I have found that if I erase the tapes, all backup jobs are successful...
I guess this means that BE9.0 writes something to the tape that future backups might not like... For me, nothing will guarantee a successful scheduled backup except for starting with clean media.
 
So to increase the time out periods, (per Zebed00 above), is 1200 (Decimal) identified in seconds and therefore equal to 20 minutes?
 
TFLE_PROGRAMMER_ERROR1 is hitting me too, for no apparent reason. It seems to be completely undocumented by Veritas. I opened a case with them, and got some rubbish back about autoloader drivers, despite explaining clearly that I was backing up to a backup to disk folder on a NAS box. Deeply unsatisfactory customer service I say, and not for the first time with them either.
 
I know that everytime I have had this problem it has been related to a security issue. The service on the backup server didn't have access to the drive on the resource server. Another thing I do regularly is reboot the backup server. I do that about 1 time a week and it seems to resolve alot of issues that I have. Alot of my problems seem to be link to the Dell 35f SCSI <-> Fibre bridge.
 
Veritas have finally posted an explanation of this particular one, dated 27th Feb this year:
Seems TFLE_PROGRAMMER_ERROR1 is an error message that is given in error (?!?!), and upgrading to 9.1 makes it go away and the correct message be given.

Why do my Veritas products always start inexplicably failing just when a new version becomes available??? Has anyone else noticed this?
 
This new version failed inexplicably but then was explained.

After upgrading to BackupExec 9.1 we all half our media came up with an infinite overwrite when it should have been available in four weeks. I became suspicious when it sorted by date as dispersed within the overwritable media.

Turns out media is mislabeled with infinite overwrite and a Hotfix is required.

If you have this problem don't waste any time troubleshooting. Get the hotfix from Veritas.




 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top