I wrote this about a year ago so it is somewhat dated but still probably worth the post, and I will not have the time to rewrite it for some time.
How to Troubleshoot Performance Issues
Trust But Verify
No place does the saying "trust but verify" hold true more than in troubleshooting a performance problem. It is necessary to have the logs to substantiate the user's claims, and accurately detail out the problem. Review a copy of the arcserve.log to build a table of each backup session, the number of directories, files, and bytes backed up and it's throughput. From that table identify the extent of the problem as far as actual throughput and targets.
At this point you know the maximum and slowest throughputs as well as how consistent they are. Now the question is how do these throughputs compare to what is normal.
What should the throughput be?
Until you know what a normal throughput should be for a given configuration it is not possible to know how much of a problem there is, if in fact this is one at all. Things to look at to help determine what the normal throughput should be are listed below in the Data Collection section.
Armed with that information you should be able to determine just what the normal throughput for this backup should be, and from there judge the severity of the throughput problem.
Local or Remote?
Is the throughput of a local back ok? This question should be asked even for a remote backup problem. The goal is to determine if the problem is only remote or if it happens locally also.
Local Backup is OK - Remote is Slow - Are all remote backups slow or just to certain targets. If it is all then check the NIC on the local system for resource conflicts, configuration, and driver problems. As a test replace the NIC and the cable, also with an analyzer test the connection from the switch back to the server. If only certain systems are slow then gather detailed information on those systems and proceed from there.
Local Backup is Slow - Remote is Slow - There could be a problem at both ends, but start with the local system first. Once local backups are OK then recheck the remote jobs and if there is a problem see the above suggestions.
Local Backup is Slow - Remote is OK - This can happen when there is a resource conflict at the hard disk subsystem that is avoided when the data travels via the NIC. Look for a sharing of an IRQ by the SCSI controller. It can also happen if there is a compatibility problem with one of the SCSI devices and the OS, to that end verify that all the hard drives are on Microsoft's Windows Hardware Certified Device List. Beyond that check the configuration of the controller, cabling, and termination.
Local Backups
Start by looking for hardware conflicts and the sharing of resources. As a test, copy via Windows and an ARCserve Copy job the same data that was backed up for a comparison. If the backup is slower then look to the tape device and it's SCSI bus for the bottleneck. Check the controller configuration and that it is properly cabled and terminated. If the tape device is not the only thing on the bus then try isolating it by removing any other attached devices for a test. All hardware should be on Microsoft's Hardware Certified Device List, including the individual drives within the system. If the copy was just as slow as the backup check for things that will slow down the storage subsystem and then expand out to a detailed review of the whole system. Look at drivers such as OS tape device drivers and drivers from other applications that might deal with the tape device. Check services for anything that has a file system driver such as virus scan and disk defrag programs. Anything that you are not familiar with should be investigated.
Remote Backups
From the table filled in earlier determine if the problem is with all clients or just certain systems. If it is with all clients then focus the attention on the ARCserve server otherwise look to the client with the worst throughput. Try the backup with and without the client agent. When the agent is used backup is via TCP/IP without the agent it is done via RDS. As a test copy the same data as backed up via Windows and an ARCserve Copy job for a comparison. Check for dropped frames and retries as an indication of network problems.
NT Client Agent
First off it should be understood that the NT Client Agent will always reduce network traffic but may not always improve performance. The agent was designed to help overcome network bottlenecks. In today's networks that are 100mb/s and more there usually is not a bottleneck at the network level.
When a backup is done without the agent the ARCserve server has to do a read request on each file and then send it back to the server. In this way there has to be communication going over the network twice for every file. When the backup is done via the agent the ARCserve server will transmit the backup instructions to the agent. The agent will then do all operations locally so that only the commands at the beginning and end of the session are sent over the network.
There are some circumstances were even in high throughput networks a performance improvement might be realized by using the agent. This might be achieved when the data is highly compressible. In such a case if hardware compression is not at the tape drive software compression can be used at the agent. In this case it is possible that because it is now compressed data that is being sent over the network a performance improvement might be seen. Since most data is not highly compressible there usually is not an overall benefit seen when used.
DATA COLLECTION
Things to look at to help determine what the throughput should be
throughput of this type of tape drive under ideal conditions
type of controller and it's settings
type, make, model, of devices on that bus
how many backups run at the same time
how much memory is on the ARCserve server
how much of this memory is free during a backup
how much free disk space is on the ARCserve server
how many files are there on the target drive
how many files are in the largest folder on this drive
what is the average size of the files
are the files compressed
is there a virus scan program scanning outgoing files
is the data local or remote
if remote is the data LAN, SAN, or NAS attached
from end to end what is the speed of the network
how much memory is on the target system, and how much of that is free
how much free disk space is on the target system
how much disk fragmentation is there
WinMSD - This is good for NT 4.0 systems. It provides a lot of basic system configuration information. Most important are the list of Services, Drivers, and IRQs.
SysInfo - The replacement for WinMSD with Windows 2000, under NT 4.0 it is available from Help in any of the Office products. It has much more information than WinMSD and when viewed via the GUI it is also easier to use.
Survey - This is a Compaq utility which will very detailed information on the hardware as well as the OS. This should be a must for any Compaq server.
Performance Monitor - This is a great tool for troubleshooting performance issue related to memory, disk, or processes.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.