Replication to local server has high latency

SQLWilts · Nov 20, 2007

Hi,
I know this is difficult to diagnose without sitting down in front of the box, but here goes.

I have a SQL server 2000 SP4 running on Windows 2003 standard with 8GB memory, 4 dual core processors.

I am replicating the live OLTP DB to a local DB (for reporting purposes) and to 2 Oracle subscribers.

The replicated DB is set to simple mode, auto expand and has plenty of free space at the moment.

The problem I am having is that the latency to the local DB is running at around 60000 - 80000 ms (and sometimes higher) and yet to the Oracle DB it is 1500 - 2000. Any ideas as to what could cause this, or where I should look first?

mrdenny · Nov 21, 2007

What sort of network connection is between the machines?

Take a look at perfmon on the subscriber. What does the disk queue look like? Is there any blocking going on? Is SQL out of memory? Is the CPU maxed out?

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

My Blog

SQLWilts · Nov 21, 2007

Hi Mr Denny,

The DB is replicating to a LOCAL DB - therefore to the same machine - no network.
CPUs are running at an average of 25%, peaking at 65%
Blocking? Same as you would expect on a busy OLTP DB - nothing drastic - any blocking (which is quite rare) clears quickly
Disk queue could be an issue - although I can't reliably measure it as the DB is on a filer, and I am not 100% sure how many spindles/parts of spindles make up the LUN.
SQL server memory, well, it is 32 bit Enterprise Edn using 6.75GB via AWE - and no, it is not registering as out of memory.
Still confuddled. Thanks for looking - I am considering trying to get a "spare box" to put the replicated DB onto - it would only need to be a smallish one too.

mrdenny · Nov 22, 2007

What do you mean by a filer? The only time I've heard something called a filer it's been a NAS which isn't supported for SQL Server to use for storage.

See what perfmon says about the disk queues. You will also need to have your sysadmin check what the load on the raid group is in the back end. Your LUN(s) may need to be moved somewhere else?

Is the production database and the reporting database on the same drive (LUN)?

I assume that the databases are on the same instance?

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

My Blog

SQLWilts · Nov 22, 2007

Sorry, I inherited this system and am trying to get my head around what is going wrong here.

Yup, both DBs on the same LUN

Both DBs on the same instance too

It is a NAS - and I didn't know that wasn't supported - is it because of read issues? Write delays don't really matter that much (as long as their ACID properties are maintained). I will have to look into that. Do you have any further reading to hand? I can google it if not.

As I said about the disk queues, I am not sure how many spindles make up the LUN, so is the disk queue figure still relevant? It is reading high (around 20), but until I can divide that by the number of spindles, it is a pretty irrelevant figure.

thanks loads for taking a look

mrdenny · Nov 24, 2007

Here is the MSKB article on using NAS for file storage for SQL Server. The main reason not to use NAS is because it's running over your normal IP network there is no way to guarantee that IP network will be able to support the bandwidth needed between the SQL Server and the NAS.

Think about it this way. Your database is humming along, working fine. Someone decides to move a massive set of files to the NAS. The NIC on the NAS maxs out. The SQL Server now does not have the bandwidth needed to access it's files.

Or worse yet, what happens if a network switch between the SQL Server and the NAS is rebooted at night and the DBA isn't told to stop SQL Server (or the switch reboots it self). Now SQL can't access it's data files and will mark the database as suspect. If you are lucky you'll be able to bring it back online, but depending on what state the log file and data files are in, you may not be able to because of file corruption.

At the very least i'd move the data files for the reporting database to another LUN. You don't want the disks to have to read and write at the same time. A queue length of 20 may not be bad. That all depends on how many disks are behind the LUN. The more important number will be what the storage says the queue length of the actual drives is. If that number is over about 3 your drives are over commited on IO and you need to either add more drives to the raid group or move some LUNs to another raig group.

Your next course of action should be to work with your storage folks and work the issue from that end.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

My Blog

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Replication to local server has high latency

SQLWilts

Programmer

mrdenny

Programmer

SQLWilts

Programmer

mrdenny

Programmer

SQLWilts

Programmer

mrdenny

Programmer

Similar threads

Part and Inventory Search

Sponsor