Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

LTO Throughput SLOW 1

Status
Not open for further replies.

OHWS

Technical User
Nov 25, 2003
129
0
0
US
We are currently performing time studies on our LTO throughhput and are having issues. They are only having throughput of around 2.5MB Sec. We are running Veritas Netbackup DC 4.5fp3 and are running on a 100mg network lines that are dedicated for backups. Our NETBUFFER size is currently at 65536 and NUMBER_DATA_BUFFERS is 16. Should we change this? If so will this significantly affect the actual Server I/O? The media servers are running important applications that cannot be interrupted. Thanks in Advance for any help!
 
OHWS
What OS are you running? Are you fibre attached or SCSI attached?

There are some recommended performance tuning steps to follow that you can pull down from support.veritas.com.

Also, which LTO drives and firmware on drives?
 
Hardware = HP9000
OS = HPUX 11.11
Connection is fibre
Drives = IBM LTO Ultrium Gen1
Firmaware = 36U3
Drivers = IBM atdd

Another question I'd like to ask is wether or not when you change the NUMBER_DATA_BUFFERS on the media servers does anything need to be changed on the clients also?? Like the size of the Data Buffers etc... I haven't made any changes to the clients, only the five media servers and the master.
Thanks for your help! p.s. I'll be double checking on the support site this am just to make sure I've taken their steps.
 
Check the FAQ at faq776-3124

I have basically the same configuration and our settings are:
NET_BUFFER_SZ = 132096
NUMBER_DATA_BUFFERS = 128
SIZE_DATA_BUFFERS = 262144


Let me know if that changes anything for you. Also - You have to make sure that your NIC's are configured manually for 100/full and not auto detect - This needs to be done on all servers and your switches. My throughput is around 10,000 on an average.
 
I've changed all of the buffer settings and have increased throughput to around 6500. This is over the network (not local). But that seems to be my ceiling right now. I still think there is a network setting that is holding me back. Does anyone know (that is getting good throughput) what there MTU (Maximum Transmission Unit) is set at for whatever frame your running. I know that it probably varies depending upon type of frame but I'd just like to get some general numbers. Thanks!
 
The data you are backing up, is that coming off the 100mbps ethernet or is that local data to the Media server?

 
I do not think you need to worry about the MTU setting - BTW, Windows default to 1500.

Is your switch set to 100/full as well? This is a vital setting that will affect speed. Is anti-virus turned off during backups? Are you using OTM/VSP? Another thing to check is on your NIC's, Flow control should be RX/TX as should checksum off loading.
 
It is coming off the 100mbps ethernet dedicated backup network. This is backing up strictly Cold backups with the DB completely down on HP-UX Platform. The NIC's are set at 100FULL DPX as well as the switch. No antivirus is running. The flow control I'm not really familiar with but I'll ask. Thanks for your responses.
 
Whats your disk type / speed sat behind your file systems?
Make sure that you haven't "undone" any array level stripping by using OS stripping in your volume groups.
You have to remember that in your sort of setup its your disk and your network that will be your bottleneck. LTO and 9840/9940 drive types can easiy max about 200+ Gb per HR. Whereas your Network can only push a max of 42 Gb per HR and is more realsitically (with network colisions, inter communication / handshaking) only going to push about 30 - 35 Gb per HR tops.
However, network speed can of course be influenced by how fast the data can be read of the disk (CPU / number of CPU's and free memory can also be a factor).
Disk can range from anything from 17 Gb/Hr (although on todays disk types and speeds this would usually indicate an issue) up to 220 Gb/Hr on a 2 Gbps fibre attached to a high end disk array running volume groups over multiple spindles and control processors.
See how long it takes to create a file from scratch on your disk sub system and in the file systems you are backing up. What speed would this equate to? Does it tie up to the backup throughput speeds your are seeing?
Can this test also be done on another server, that is backing up at good speeds, with different disks to give you a reference?
Try ftp'in a file over your backup LAN... this can give a idea of your "raw" network speed (make sure the file is of a resonable size 1 Gb +).
Try doing a NULL backup from the client to your media server (define a disk storage unit on your media server - then touch /usr/openv/netbackup/bpdm_dev_null - this will point ALL disk based backups to /dev/null - ruling out the tape drive and showing pure disk --> network --> media server performance).
See what results you get from these tests.

Simon Goldsmith
Storage Engineer
 
I have narrowed the issue down to the network capability. I had a backup run last night locally and all four of my LTO drives were getting around 11,000KBps. Thanks for all your help in this matter!
 
Simon,

>Make sure that you haven't "undone" any array level >stripping by using OS stripping in your volume groups.

Heh. I'm trying to convince a project manager here that running a striped volume group on top of RAID5 striped SAN disk (in an HDS 9960) is a Very Bad Idea.

But they've insisted we relayout the volume group as a stripe set as they are equally convinced it'll be faster than the present concatenated volume...

Storage can be such fun, don't you think?

Cheers - Tim
 
Yes... but of course he knows best!!!

Let him do it... get him to sign a doc saying that this won't work! Performance will be crap! And you told everyone so!

And then when its crap and everyone's complaining in 6 months time when the workload increases etc.... pull out the famed document and say, in a loud and proud voice...

"I TOLD YOU SO!!!!"

It will give you a remarkably good feeling.

Simon Goldsmith
Storage Engineer
 
Well, quite.

I've already re-presented the raw disk from the HDS so as to minimize any disk access contention (All the LUNS are from three discrete parity groups so no other clients can access these) so he now has 12 spindles for Oracle to play with. (and yes, I know the concept of spindles fails in virtualised disk arrays but they won't have that one either....!)

I was always taught that you tuned an application from the application - code - OS - hardware down. But as is usual here, when anyone has any sort of access or speed issue they always break my door down first accusing the SAN of being the bottleneck. It hasn't been a bottleneck yet.

Moan moan...<g>

Cheers - Tim

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top