fuzzyocelot
Programmer
Hey everyone!
We’re still having performance problems with one of our new SQL 2005 servers that uses merge replication. I realize there is a LOT of documentation out there on performance tuning. My head is swimming from trying to digest it all so I would really appreciate some advice.
There are people out in the field who have tablets. At night they plug the tablets in and the merge replication process kicks off for several hours. There are less than 30 tables plugged in at night. The performance problems occur all throughout the day, though. There are about 7 databases on the server that are used. The main one used as the Publisher is 7 GB. The others are much smaller. There is one distribution db on the same server. Other applications connect to the databases throughout the day. I’d say on average there’s less than 20 connections to the server. All connections are remote connections.
This is the only server that uses this kind of replication so we’re still trying to figure it all out. The vendor who upgraded the application also set up the replication settings. So we’ve been trying to get them to trouble-shoot the performance problems too. The last advice we heard from them was to read Microsoft’s technet site about replication. Lovely, huh?
I don’t know the specs of the last server prior to the upgrade as I wasn’t really involved in it much. I do know they had performance problems there as well and had to build the server up by putting more RAM on it. I don’t know if we need to do that here as well.
The server itself is running Windows 2003 Server SP2. We have SQL 2005 SP2 on it as well. It also has 4 GB of RAM DDR2, Dual CPU 3.8 GHz Xeon with 2 MB cache, and 15k rpm disks.
We thought about enabling AWE and re-configuring the max server memory to something like 3.5 GB to see if that helps. Right now it’s using the defaults. We would also like to try to set the databases compatibility levels to SQL Server 2005 (90) if they are compliant. The vendor said the server has to be run with SQL 2000 compatibility, but replication needs to stay at the SQL 2005 compatibility level.
I ran DBCC MEMORYSTATUS to see if anything showed up. Here are some of the results. I’ve been trying to analyze it but am having some trouble figuring it out. This kind of thing is not my forte. Let me know if anyone want to see anything specific:
I haven’t run Performance Monitor yet as I’m not familiar with it. If I did run it, what counters would be useful?
I tried to do a trace but couldn’t see any queries or anything else of interest.
I was in training for the last week so I haven’t been checking it lately. Yesterday I did a check on the Job Activity Monitor and found one job that’s been running since 5/30. It’s called "Replication monitoring refresher for distribution". According to one of the system tables it has been running since 5/30 and appears to be taking up a lot of CPU (6908843) and physical I/O (930565).
According to the sysprocesses table, the status for this job is "suspended". What I found strange is that this job is disabled yet it's running. I know this job runs the sp_replmonitorrefreshjob stored procedure which appears to be a system procedure for replication. So I searched through TechNet for the job name and found out that it refreshes the cache for replication. So I suppose we can try to reset the properties for it or something to see if it helps. Does anyone know much about this process? I’m waiting for the data owner to give the go-ahead to try resetting the properties for it. Does anyone think it’ll help? Should we rebuild the server? The data owner thinks that will help but we'd like to exhaust all our other options first.
I really appreciate any advice! TIA!!!
We’re still having performance problems with one of our new SQL 2005 servers that uses merge replication. I realize there is a LOT of documentation out there on performance tuning. My head is swimming from trying to digest it all so I would really appreciate some advice.
There are people out in the field who have tablets. At night they plug the tablets in and the merge replication process kicks off for several hours. There are less than 30 tables plugged in at night. The performance problems occur all throughout the day, though. There are about 7 databases on the server that are used. The main one used as the Publisher is 7 GB. The others are much smaller. There is one distribution db on the same server. Other applications connect to the databases throughout the day. I’d say on average there’s less than 20 connections to the server. All connections are remote connections.
This is the only server that uses this kind of replication so we’re still trying to figure it all out. The vendor who upgraded the application also set up the replication settings. So we’ve been trying to get them to trouble-shoot the performance problems too. The last advice we heard from them was to read Microsoft’s technet site about replication. Lovely, huh?
I don’t know the specs of the last server prior to the upgrade as I wasn’t really involved in it much. I do know they had performance problems there as well and had to build the server up by putting more RAM on it. I don’t know if we need to do that here as well.
The server itself is running Windows 2003 Server SP2. We have SQL 2005 SP2 on it as well. It also has 4 GB of RAM DDR2, Dual CPU 3.8 GHz Xeon with 2 MB cache, and 15k rpm disks.
We thought about enabling AWE and re-configuring the max server memory to something like 3.5 GB to see if that helps. Right now it’s using the defaults. We would also like to try to set the databases compatibility levels to SQL Server 2005 (90) if they are compliant. The vendor said the server has to be run with SQL 2000 compatibility, but replication needs to stay at the SQL 2005 compatibility level.
I ran DBCC MEMORYSTATUS to see if anything showed up. Here are some of the results. I’ve been trying to analyze it but am having some trouble figuring it out. This kind of thing is not my forte. Let me know if anyone want to see anything specific:
Code:
Buffer Counts Buffers
------------------------------ --------------------
Committed 202776
Target 202776
Hashed 98425
Stolen Potential 88662
External Reservation 0
Min Free 240
Visible 202776
Available Paging File 415793
Memory Manager KB
------------------------------ --------------------
VM Reserved 1682420
VM Committed 1678732
AWE Allocated 0
Reserved Memory 1024
Reserved Memory In Use 0
Memory node Id = 0 KB
------------------------------ --------------------
VM Reserved 1678260
VM Committed 1674724
AWE Allocated 0
MultiPage Allocator 24568
SinglePage Allocator 831800
Buffer Distribution Buffers
------------------------------ -----------
Stolen 2585
Free 376
Cached 101390
Database (clean) 86890
Database (dirty) 11532
I/O 0
Latched 3
(7 row(s) affected)
Buffer Counts Buffers
------------------------------ --------------------
Committed 202776
Target 202776
Hashed 98425
Stolen Potential 88662
External Reservation 0
Min Free 240
Visible 202776
Available Paging File 415793
(8 row(s) affected)
Procedure Cache Value
------------------------------ -----------
TotalProcs 4448
TotalPages 88619
InUsePages 92
Optimization Queue Value
------------------------------ --------------------
Overall Memory 1331306496
Target Memory 451420160
Last Notification 1
Timeout 6
Early Termination Factor 5
I haven’t run Performance Monitor yet as I’m not familiar with it. If I did run it, what counters would be useful?
I tried to do a trace but couldn’t see any queries or anything else of interest.
I was in training for the last week so I haven’t been checking it lately. Yesterday I did a check on the Job Activity Monitor and found one job that’s been running since 5/30. It’s called "Replication monitoring refresher for distribution". According to one of the system tables it has been running since 5/30 and appears to be taking up a lot of CPU (6908843) and physical I/O (930565).
According to the sysprocesses table, the status for this job is "suspended". What I found strange is that this job is disabled yet it's running. I know this job runs the sp_replmonitorrefreshjob stored procedure which appears to be a system procedure for replication. So I searched through TechNet for the job name and found out that it refreshes the cache for replication. So I suppose we can try to reset the properties for it or something to see if it helps. Does anyone know much about this process? I’m waiting for the data owner to give the go-ahead to try resetting the properties for it. Does anyone think it’ll help? Should we rebuild the server? The data owner thinks that will help but we'd like to exhaust all our other options first.
I really appreciate any advice! TIA!!!