Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

WriteFile Failed Message during backup 1

Status
Not open for further replies.

ValleyDweller

Technical User
Dec 3, 2001
2
US
Here is my environment config:

* 1 NetBackup Master Server, 4.5FP6_S1021
* 1 NetBackup Media Server, 4.5FP6_S1021
* 4 NetBackup Clients, 4.5FP6_S1021

* 1 STK L180 Changer, SCSI attached to NB Master.
* 2 IBM Ultrium LTO Gen2 drives, fibre connected to STK 3800 2Gbps SAN Switch.

4 client servers write through the Media Server. Master backs only itself up. Both LTO drives are SSO Shared between the Master & Media Server. The media server has ~700GB of user data on it...file server. Media Server has Emulex LP9002 HBA connecting to SAN Switch. also, this HBA is shared for Disk and Tape access.

I run the backup of this media server multistreamed as 9 jobs, multiplex 4. A max of 4 streams will run to each drive. When I do the backup this way, it always ends up hanging at different points during the backup, and the job cannot be killed using standard methods. I have to use the "Kill" command to get rid of the jobs. In the "All Log entries", I see the media server issuing a "WriteFile Failed" message. If I run the same backup to one drive, it works fine. I have tested this to each drive and they're both fine. When I kill the bpbkar, oprd, bpbrm, and bptm processes on the media server, the jobs die on the master, but the device file having been used for tape access seems tied up and I can't use the drive again until I reboot the server...if I could at least avoid having to reboot the server, that would be a big help. Note that this only happens to the streams running on one particular drive. the streams running on the other will complete just fine.

The Master is Windows 2000 A.S. and the Media Server, as well as the clients, are Windows Server 2K3. Note: I am using VNETD and the NOSHM touch file exists on the Media Server to allow the VNETD restore to function properly.

Can someone help? this has been going on for some time now and is really baffling me.

Thanks,
Pat.
 
Hi,

i know this isn't advice, but we are having the same problems as well. The only difference being, we are using Netbackup Enterprise vers 5mp1.

Any help would be appreciated.

Thanks

Rich

Cheers
richs24

[yoda]
 
We are having this exact same issue!

Our Environment:
1 - Master (no devices) 4.5FP6
2 - Media Servers AIX and Windows2k SSO
1 - Library Scalari2K with 14 FC LTO2 drives

Jobs seem to be waiting for some kind of response from BPTM? The activity monitor displays jobs like nothing is wrong. The kb/sec does not go down, just stays the same. The only indication that nothing is writing is the number of files in the activity monitor never changes for hours on that job. The BPTM processes on the Windows server doing the work can't be killed and drives remain unavailable until the server is rebooted.
 
I was having problems that sounds very similar to the ones you are having except more specifically on only 3 or 4 of my 9 media servers. Although the error (tape write fail) implied that it was an error on the tape drive, I actually updated the GB NIC driver (receiving NIC from the clients) on a hunch (don't remember why), and it cleared the errors. I hope this helps.

FYI: We were using 4.5 fp6 at the time (upgraded 5.0 mp2 today)
Win2k on all servers:
1 Master/Media
9 Media
2 SAN Media
 
In case some of you are still having this issue, you should know that as soon as I disabled VNETD everything started working to perfection and as hard as I try I have been unable to break the thing. If you are not running VNETD, but are communicating to the Master server through a firewall, try opening the default range of port (1024-5000) and (512-1023). If you have no network security between your masters and your client/media servers, I don't know what to tell you.
 
It looks like this problem may be a thing of the past. We upgraded the HBA drivers and firmware to the latest available. Emulex 9802 cards (driver 5.2.22a8)(firmware 1.90a4) We have not seen the hangups yet.. it has only been 2 days. It even seems as though the activity monitor provides better and more accurate updating. I will keep everyone posted.
 
We had the same problem with multiplexed backups when we went to 5.0. We worked with Veritas to discover the problem and they fixed it in 5.1. I also see it occasionally still and will try updating the HBA drivers and firmware. Thanks!
 
Update:
Still no issues. We do about 8 terabytes in 4000 jobs nightly. It seems as though the drives mount and unmount exactly as they should each and every time.I highly recommend that all HBA's firmware and drivers be upgraded.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top