Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Something is causing all active backup jobs to simultaneous fail

Status
Not open for further replies.

ksetia

Technical User
May 7, 2002
34
0
0
SG
Hi,

We are currently facing this problem where something is causing all active backup jobs to simultaneous fail with a status 50. When this happens, there will be core file generated by the bpsched binary. As a result, we would have to restart all the failed backups.

The funny thing is that it is happening to both our NetBackup 3.4 and 4.5. Both master servers are running on the SUN E450 with the Solaris 8, 4 x 480MHz, 2GB of Memory. The tape library for the NetBackup 3.4 is using the StorageTek L700e and the NetBackup 4.5 is using the SUN L1800.

Has anybody out there faced the problem of this nature? Your input in resolving this issue will be greatly appreciated.

Thanks in advance.

Regards,
Kevin Setia
 
As per the error in the manual ...

client process aborted
The client backup aborted. One instance when this code appears is if a NetBackup master or media server is shut down or rebooted when a backup or restore is in process.

Try the following:



1. Enable detailed debug logging:

* Create a bpbkar debug log directory (UNIX or Windows only).

* Create a bpcd debug log directory (this log is created automatically on Macintosh clients.)

* On UNIX clients, add the VERBOSE option to the /usr/openv/netbackup/bp.conf file.

* On PC clients, increase the debug or log level as explained in the debug log topics in Chapter 3 of the Troubleshooting Guide.

2. Retry the operation and examine the resulting logs.

3. On UNIX clients, check for core files in the / directory.

4. On UNIX clients, check the system log (/usr/adm/messages on Solaris) for system problems.

5. This problem can sometimes be due to a corrupt binary.

On UNIX clients, use the UNIX sum command to check the bpcd, bpbkar, and tar binaries, located in /usr/openv/netbackup/bin on the client. Reinstall them if they are not the same as in the client directory under /usr/openv/netbackup/client on the server.

On UNIX, run the NetBackup Configuration Validation Utility (NCVU) for the associated NetBackup clients. Note the client software checks in section two.

On a Windows client, check the bpinetd.exe, bpcd.exe, bpbkar32.exe, and tar32.exe executables located in the install_pathNetBackupbin folder on the client. Reinstall the client if these executables are not the same size as on other Windows clients or are not at the same release level or do not have the same NetBackup patches applied as other Windows clients.


Suggestion - Install the latest FP or MP (Depending on your route) as this will fix any possible corrupt binaries etc.
 
Thanks PGPhantom for your contribution. However, all the above possibilities have already been looked into and ruled out. As such, we are at lost as what might be the cause. Still liasing with Vendor but not much help.
 
What NIC do you have on your master? I have seen cases where the drivers are acting up for the master's nic and it momentarily loses connectivity, causing everything to abort. In our environment we had a similar issue with a few servers that had the Broadcom NIC's on W2K - Updated the drivers and ... viola, problem gone.
 
The NIC were are using are the standard hme and ge. Our machine is a SUN E450, running on Solaris 8 with 2GB of memory. The NetBackup is version 3.4.

How do I go about checking the version?
 
Do a "netstat -i" on your box and make sure you have zero ierrs and zero oerrs. Anything other then zero means your interface under load is having either a cable issue or a port neg issue. This would cause all backups to drop.
The solution is to hardcode the network port and the NIC to 1000/Full Duplex.
 
Just saw a referrence to this - There is a bug in FP6/MP6 that affects some systems. There is a patch on Veritas's site but the actual resolution listed is to upgrade to Enterprise 5.0
 
Hi rugby01,

Checked the system with "netstat -i" and we get zero oerrs but a non-zero ierrs. Collision and Queue column is shown to be zero. How would one interpret this?

Our existing set up has already been setup to be full duplex for both hme and ge interfaces.

=================================================
Hi PHPhantom,
Our server version is
HARDWARE SOLARIS
VERSION NetBackup 3.4GA
RELEASEDATE Tue Jun 20 03:04:00 CDT 2000

while the client version is
NetBackup-Solaris2.6 3.4patchNB_34_4

the latest patch installation are:
Installation of patch NB_JAV_34_4 completed Tue Sep 9 19:17:31 SGT 2003 Rev. 1.64.
Installation of patch NB_34_4 completed Fri Nov 7 05:46:50 SGT 2003 Rev. 1.64.

We are actually at our wits end with this problem. Upgrading to NetBackup 5 is a major step that will need management approval ... and you know how hard that will be. =)

Thanks you all for the suggestion provided ... really appreciate it.
 
I'm guessing your network switch port is set to autoneg. Call your network person and have them hard set the port to 100/Full. Then put a load on the interface to see if the errors increase. You should be able to FTP files from /tmp to another Sun machines /tmp. The /tmp file system is a memory drive and will allow the nic to hit 70-80% capacity. If theres a phyical problem or configuration error - then you will see netstat -i start to increase. I used to create a dummy file via dd if=/dev/null of=/tmp/junk (you can also use an input of /dev/dsk/c0t0..) - let that run for a minute and you should get a preatty large file to transmit. NOTE! make sure you don't overfill /tmp on either server doing the test.
:O)
Cheers...
 
What ports on a Windows client server allows connectivity to it from a master server?

Thomas Hannah - JEA
 
Ksetia- just curious, do you have enough memory on these boxes?

I've seen the 50 errors when we've run out of virtual memory and the system kills job's in an attempt to free memory.



Ryan


 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top