Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

NDMP backups failing with error 99 3

Status
Not open for further replies.
Jan 28, 2004
6
GB
I have an ndmp client which currently does not backup and results in an error 99:

03/12/2004 10:29:54 <backupserver> <ndmpclient> backup of client <ndmpclient> exited with
status 99 (NDMP backup failure)

--> The client does authenticate with the set_ndmp_attr -verify <ndmpclient> command:
Connecting to host "<ndmpclient>" as user "ndmp"...
Waiting for connect notification message...
Opening session--attempting with NDMP protocol version 3...
Opening session--successful with NDMP protocol version 3
host supports TEXT authentication
Logging in using TEXT method...
Host info is:
host name "server_2"
os type "DartOS"
os version "Symmetrix Network File Storage.T.2.2.62.1"
host id "abc1997"
Login was successful
Host supports LOCAL backup/restore
Host supports 3-way backup/restore

--> I have copied the schedule from a similar ndmpclient and limited the backup to a single pathname /export7 which I am trying to backup. The path does exist on the ndmpclient.

--> the media library seems to be fine and successfully backs up 3 similar ndmpclients which are configured in a similar manner, and bpbrm logs do not complain.

--> bpsched seems to log the following (with verbose set to 5)

10:29:54.739 [5393] <2> bpcr_authenticate_connection: no authentication required
10:29:54.739 [5393] <2> bind_on_port_addr: bound to port 646
10:29:54.743 [5295] <16> bpsched: scheduler exiting - NDMP backup failure (99)
10:29:55.469 [5335] <2> ?: exiting with status 99
10:29:55.471 [5335] <2> job_monitoring: ACK disconnect


Does anybody have any ideas, suggestions on how I can progress this or specify exactly what the problem is?

Kind regards

M
 
As per the error message: Check the NetBackup All Log Entries report for more information. A possible cause for this error is that none of the backup paths exist on the NDMP host.
 
Transpires that there is an issue with our NDMPclient. The backup paths do exist, but the communication from the NDMPclient is halted with an error - NDMP data halt internal error. Data halt interupt.

Error manifests itself with a error 99 code - which makes sence. Thanks for the help.
 
Have you run bpclntcmd from the client, media server and master server to ensure that DNS is 100% funtional and that everything communicates okay?
 
I get the same error on NetApp filers, haven't been able to find a fix. It happens a few times per week sporadically. Status 99. We're running DOT 6.4.2p6
 
I am still investigating this issue with support vendors, early indications seem it may be related to file systems filling up.
 
Yes, I am pointing the finger at NetApp. "no available buffers" errors in backup logs and "no buffers available" errors in messages logs. Root partition is getting full. I don't know yet if the Status 99 errors are generated when snapshot space is not available but it looks like Veritas is not programmed to interpret these NDMP messages as "out of disk space" errors. Thanks.
 
Resolved!

Seems that somebody enabled a non-physically connected interface on the NDMPclient. The NDMP responses did not go down the default route to the NetBackup server - therefore no acknowledgement received by NetBackup server.

Weird, thanks for all the help from everybody to eliminate what it wasn't.
 
I had a ticket open with Veritas for three months for this same issue, NDMP failing with 99. We are running Netbackup DataCenter 4.5 on a Windows 2000 server box. The NDMP backups are for a Network Appliance machine. We had also suspected snapshot issues, but when we expanded our capacity and still had failures we opened a ticket.
Here is what we came up with.

The first thing to check is the file NDMP_PROGRESS_TIMEOUT
it is located in %installpath%VERITAS\NetBackup\db\config

The default is 8 hours so if your backups take longer than that they will fail. Change the value in the file to 1440. That is 24 hours.

Our logging had been set to 5 for the trouble shooting and probably 2 or 3 before that. The resources that logging was using on the MasterServer was slowing down the transfer rate. After turning off logging the transfer rate went to a reasonable speed.

Our final problem was simply bad tapes. For us, once the above two things were taken care of, the only time we got 99 errors was because of bad tape. However Netbackup doesnt know the tape is bad (and won't freeze it) so as long as it thinks that tape is available it will keep trying to use it. The Veritas tech support said there is no setting that would have Netbackup try another tape if one fails, it will continue to grab the bad tape over and over again. Now when a backup fails with 99 I remove the tape it was using when it failed. This has solved almost all of our NDMP backup issues.

Of course this my mean you have to do a lot more monitoring than before. For example when I have a tape fail on the weekends I now have to move the bad tape to "Standalone" so that it won't get picked again. Obviously I have to check in a lot more now than I used to. I also put mostly new tapes in the drives on Fridays before I leave.
 
Here is something I wrote up in the NetBackup-FAQ-O-Matic

How to FREEZE media after only 1 media error?

If Netbackup detects an error when writing to tape during a backup, is there a way to make it continue the backup on a different tape? NetBackup will resubmit a job that fails with a media error according to the Global Settings for failed backup attempts. Unfortunately it may try using the same tape that just had a media error. Create a file /usr/openv/netbackup/MEDIA_ERROR_THRESHOLD. Enter the number of allowable errors before NetBackup will freeze a tape. If you enter the number zero then it will FREEZE the tape on the first media error.

The file MEDIA_ERROR_THRESHHOLD , is that an entry in bp.conf or do you create a new file with that name and then put that entry in it with the desired value? I am having a similar problem where it takes an hour for it to error out because of the retries.



Bob Stump
Just because the VERITAS documentation states a certain thing does not make it a fact and thats the truth
 
Thanks Highlander and Stumpr-this certainly helps. My biggest complaint is the NDMP and Netbackup don't talk much-when it works it works-but when you have problems-its very time consuming to figure out.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top