Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations sizbut on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Determine if you are bottlenecking ...

Status
Not open for further replies.

PGPhantom

IS-IT--Management
Nov 4, 2002
545
CA
Just a quick FYI

Please keep the following in mind ...
DLT7000 backs up 20/40Gb hour
DLT8000 22/44Gb hour
SDLT 40/80GB hour
LTO 54/108Gb hour - Based on manufacturers theoretical data transfer rates. first is native and second compressed.

Industry average for compression is 1 to about 1.6. This is based on hardware compression and not software compression.

10Mb/sec link can transfer 4.4Gb hour
100Mb/sec is 44Gb hour
1000Mb/sec is 440Gb hour Theoretically. Real world and using TCP/IP you lost about 50% for overhead etc.

NEVER, NEVER use auto negotiate for backups. If you push too much data to the media server you will get flow control issues where the data is being resent even though it is not lost - This is an issue with switches etc buffering data. Data transfer can fall by up to 70%.

Do not use more than 2 DLT drives per SCSI channel and only one per LTO drive.

 
Do you know any performance limitation on W2K runing NBU 4.5? It is because we found the I/O throughput was saturated at ~80MB/sec.
 
80MB/sec translated into 281.25 GB/hour - This is the typical throughput on a GB ethernet connection after TCP/IP overhead etc is taken into consideration. On some systems where the packets have been optimized I have seen it increase to around 350GB/hour.

Also - As a clarrification - Never use auto-negotiate for 10/100 links. It is recomended on 1000 links.

Also - Most SCSI cards are rated at 80MB/sec.

This past weekend I checked our throughput on a W2K media server just to confirm and for a two hour window we were getting 110GB/hour on a GB ethernet connection - This is almost at maximum of our tape sub-system on that box.

I do not believe that I have seen a case where saturation is at 80MB/sec on W2K unless there are other factors involved like network, SCSI, PCI, CPU, memory etc. Once all have been optimized - Things look good.

Anyone else ever seen this? I would be interested to know what the rest of the hardware architecture is.
 
I am sorry I am state clear my configuration before. Actually, the NBU server configured as media server that direct attached to SAN environment. No network backup involed. The server & SAN configuration as follow.

Proliant DL760 with 8 x Xeon 900 CPU
6GB memory
4 x Fibre Channel Adapter (OEM Emulex LP952)
- 2 HBA connected to Enterprise Virtual Arrary
- 2 HBA connected to SAN Switch that dedicated for backup
- Network Storage Router (Convert Fiber to SCSI)
- ESL9595 Tape library with 12 SDLT 110/220.
NetBackup Version 4.5

+--------+ +------+
| Server |======| EVA |
+--------+ +------+
| | <-- (~40MB/sec/bus)
| |
+---------+
| SAN SW |
+---------+
| | <-- (~40MB/sec/bus)
| |
+---------+
| NSR |
+---------+
| | | | |
| | | | | <-- 6 x SCSI (each conected to 2 SDLT )
+----------+
| ESL9595 |
+----------+

Test Case
----------
The server is SQL server housing around 1.0TB data. We are using SQL dump to export the table into ten files each around 100GB. No NBU SQL agent installed. We configured a single class to backup the ten files to 10 x SDLT tape drive. And we found the overall throughput was saturated at ~75MB/sec. The average throughput for each SDLT is
7.5MB/sec.

Then we tried to isolate the problem and idetntify the bottleneck. We tried to submit the jobs one by one and turned on the perfmon to capture the system performance.
The result we found is the performance starting degraded after 6th backup job starting running and it become saturated(I would rather said flated) at 75MB/sec when more jobs comes in.

When the jobs completed one by one, we notice the performance will resume gradually.

We added one more EVA to eliminate the I/O bottleneck but no help at all.

We submitted additional jobs from other media server connected to same SAN environment, the jobs runs at 15MB/s.
It shows the bottleneck should not be on SAN SW, NSR and Tape library. The bottleneck is somewhere on the server, either OS or inside NBU (share memory stuff).

We looked at the bpkar and bptm log, the counter indicated the ~21mins delay on server writing data to tape.

The problem can be replciate in our lab environment.

 
SKLee, you are right to determine that the FC Network is not the problem. As far as the OS or NBU, I don't think that is it either.

You need to look at the bus speed of the server and of each PCI slots those cards are in. You need to remember that a server bus can only push so much.

I can't remember the formula that Veritas uses, but I will check back at work for it.
 
The three HBA are installed on three different PCI buses.
- 2 x HBA running PCI 33Mhz that connected to storage
- 2 x HBA running PCI 66Mhz that connected to NSR

As we found the average throughput for each HBA is at only 40MB/sec. HBA or PCI unlikely be the bottleneck.
 
PCI runs at 33MHz so a 64bit 33MHz bus is rated at 928GB/hr and a 32bit 33MHz bus is rated at 464GB/Hr.

A thought that I just had - Are all you storage units in one storage unit group? If so, try and break them up for a test. If memory serves me correctly, each group is assigned a dedicated bpsched process and the process may be saturated - Check your log files ... Here is part of a performance doc I wrote ...

To see how your system is doing, check the bptm log files for the following lines:

13:10:26 [1996.2392] <2> write_data: waited for full buffer 2341 times, delayed 3362 times
· BPTM is waiting for data from the source so data is not arriving fast enough (Each delay is 30ms)
· Add Multiplexing
00:53:05 [25928] <2> fill_buffer: [48992] socket is closed, waited for empty buffer 23992 times, delayed 24145 times, read 11040576 Kbytes
· BPTM is waiting for an empty buffer so data is arriving faster than it can be written.
· Increase the NUMBER_DATA_BUFFERS.
· Reduce multiplexing.

And ...
For each 1MB/sec of data arriving at the server you need 5MHz CPU processing power. To calculate:
<Num of clients> * <rate in MB/sec> * 5MHz * 2 (5MHz for reading off network and 5MHz for writing to tape). E.g. 8 jobs averaging 6000KB/sec needs = 480MHz.
Memory Used = (buffer_size * num_buffers) * num_drives * MPX’ing.

Do you have anti-virus software running? Even on media servers that needs to be turned off when doing a backup.

Let's start with that and see where we can go from there...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top