Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SQL 2k5 Performance Issues: Replication Fix or Server Re-build? 2

Status
Not open for further replies.

fuzzyocelot

Programmer
Jul 22, 2003
333
US
Hey everyone!

We’re still having performance problems with one of our new SQL 2005 servers that uses merge replication. I realize there is a LOT of documentation out there on performance tuning. My head is swimming from trying to digest it all so I would really appreciate some advice. :)

There are people out in the field who have tablets. At night they plug the tablets in and the merge replication process kicks off for several hours. There are less than 30 tables plugged in at night. The performance problems occur all throughout the day, though. There are about 7 databases on the server that are used. The main one used as the Publisher is 7 GB. The others are much smaller. There is one distribution db on the same server. Other applications connect to the databases throughout the day. I’d say on average there’s less than 20 connections to the server. All connections are remote connections.

This is the only server that uses this kind of replication so we’re still trying to figure it all out. The vendor who upgraded the application also set up the replication settings. So we’ve been trying to get them to trouble-shoot the performance problems too. The last advice we heard from them was to read Microsoft’s technet site about replication. Lovely, huh?

I don’t know the specs of the last server prior to the upgrade as I wasn’t really involved in it much. I do know they had performance problems there as well and had to build the server up by putting more RAM on it. I don’t know if we need to do that here as well.

The server itself is running Windows 2003 Server SP2. We have SQL 2005 SP2 on it as well. It also has 4 GB of RAM DDR2, Dual CPU 3.8 GHz Xeon with 2 MB cache, and 15k rpm disks.

We thought about enabling AWE and re-configuring the max server memory to something like 3.5 GB to see if that helps. Right now it’s using the defaults. We would also like to try to set the databases compatibility levels to SQL Server 2005 (90) if they are compliant. The vendor said the server has to be run with SQL 2000 compatibility, but replication needs to stay at the SQL 2005 compatibility level.

I ran DBCC MEMORYSTATUS to see if anything showed up. Here are some of the results. I’ve been trying to analyze it but am having some trouble figuring it out. This kind of thing is not my forte. Let me know if anyone want to see anything specific:
Code:
Buffer Counts                  Buffers
------------------------------ --------------------
Committed                      202776
Target                         202776
Hashed                         98425
Stolen Potential               88662
External Reservation           0
Min Free                       240
Visible                        202776
Available Paging File          415793

Memory Manager                  KB 
------------------------------ --------------------
VM Reserved                    1682420
VM Committed                   1678732
AWE Allocated                  0
Reserved Memory                1024
Reserved Memory In Use         0

Memory node Id = 0              KB 
------------------------------ --------------------
VM Reserved                    1678260
VM Committed                   1674724
AWE Allocated                  0
MultiPage Allocator            24568
SinglePage Allocator           831800

Buffer Distribution            Buffers
------------------------------ -----------
Stolen                         2585
Free                           376
Cached                         101390
Database (clean)               86890
Database (dirty)               11532
I/O                            0
Latched                        3

(7 row(s) affected)

Buffer Counts                  Buffers
------------------------------ --------------------
Committed                      202776
Target                         202776
Hashed                         98425
Stolen Potential               88662
External Reservation           0
Min Free                       240
Visible                        202776
Available Paging File          415793

(8 row(s) affected)

Procedure Cache                Value
------------------------------ -----------
TotalProcs                     4448
TotalPages                     88619
InUsePages                     92

Optimization Queue             Value
------------------------------ --------------------
Overall Memory                 1331306496
Target Memory                  451420160
Last Notification              1
Timeout                        6
Early Termination Factor       5

I haven’t run Performance Monitor yet as I’m not familiar with it. If I did run it, what counters would be useful?

I tried to do a trace but couldn’t see any queries or anything else of interest.

I was in training for the last week so I haven’t been checking it lately. Yesterday I did a check on the Job Activity Monitor and found one job that’s been running since 5/30. It’s called "Replication monitoring refresher for distribution". According to one of the system tables it has been running since 5/30 and appears to be taking up a lot of CPU (6908843) and physical I/O (930565).

According to the sysprocesses table, the status for this job is "suspended". What I found strange is that this job is disabled yet it's running. I know this job runs the sp_replmonitorrefreshjob stored procedure which appears to be a system procedure for replication. So I searched through TechNet for the job name and found out that it refreshes the cache for replication. So I suppose we can try to reset the properties for it or something to see if it helps. Does anyone know much about this process? I’m waiting for the data owner to give the go-ahead to try resetting the properties for it. Does anyone think it’ll help? Should we rebuild the server? The data owner thinks that will help but we'd like to exhaust all our other options first.

I really appreciate any advice! TIA!!! :)

 
Sounds like things are looking better.

Another good indicator of disk issues is the PhysicalDisk:Avg Disk sec/Read and Avg Disk sec/Write counters. These will tell you if the disks are taking a long time to read and write the IOs from the disk. On shared storage such as a SAN this can indicate that the disks are over subscribed.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
I haven't heard back from the client yet to see if they've noticed any changes in performance. I ran the performance counters all day today and haven't had a chance to review them yet. I've been busy with another server that's having performance problems. A DBA's work is never done! :)

mrdenny, I'll try the counters you mentioned to see what they show.

I just don't know what's up with the servers lately. The other one that's having issues has been running fine for many years. It seems that all of a sudden we're getting CPU bottlenecks. But that's another thread. :)

Thanks guys!! Things are starting to look up! :)

Rebecca
 
Hey guys,

The client turned off the tablets that are part of the replication part. He also rescheduled the replication merge jobs to run after 5 pm so they aren't running every 15 minutes during the day anymore. Just after 5 pm.

There are still CPU spikes every so often. I can't find a regular pattern yet. For the performance counters, I also have Processor(Total)\% Processor Time set. I can't see any big CPU spikes in the log file for the CPU.

So I started a SQL trace using Profiler. Will this be enough for me to determine the cause of the big CPU spikes (90-100%)? The spikes are lasting for a few minutes from what I can see. On average, the CPU utilization (via Task Manager) seems to fluctuate between 4% and 60%+. When the spikes occur, I can see that it's SQL that has the high CPU utilization.

I opened the Activity Monitor (via Mgmt Studio) and can't see anything being blocked. I can see the CPU for certain user processes that are around 20,000+. There are only a couple of those. I'm also going through the list of queries ptheriault listed above.

I ran a query on sys.dm_exec_query_stats that shows the top 5 total_worker_time/execution_count (avg cpu time) and the top 5 are:

[li]556017[/li]
[li]488733[/li]
[li]396933[/li]
[li]380464[/li]
[li]314436[/li]

So I don't know if any of this is something to be concerned about. I'm not sure what these times are measured in either.

Any ideas or advice? Thanks! :)
 
I just realized I can import and correlate the performance counter logs into sql profiler! This should help me figure things out better. Do I need to start a new thread about the CPU problems? Thanks!
 
Go ahead and start a new thread. That will make it easier for people to find it in the future.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
Update: We met with the client last Friday and things went well. During the meeting, he mentioned that no one had recently complained to him about performance problems. So that could be a good sign.

He's going to talk to the vendor about a few issues and have someone look at the communications link between the various remote locations and where the server resides. Some of the performance issues could have something to do with the bandwidth, cables or something along those lines. So for now, we're going to just sit and watch the server to see how it goes. Hopefully things are running better than they were before.

Thank you guys so VERY much for all your help! I GREATLY appreciate it! :) I'll hold off on the separate CPU post regarding this particular server. I may end up posting about another one we're having problems with. :)

Thanks again! :)
 
I found something REALLY cool that I wish I had found months ago. It may have saved us some grief. Anyway, SQL 2005 service pack 2 (SP2) should have come with a bunch of performance reports. I happened to stumble upon them today. One thing about it that's really nice is that is "a very inexpensive means of monitoring your server".

I couldn't find any threads on it on tek-tips. So if someone already mentioned it, I apologize.

To see if you have the reports, open Management Studio and connect to a SQL 2005 server. Then right-click on the server name, select "Reports", and hopefully you'll see "Standard Reports". Next you should see a big list of cool reports you can use.

If you don't see any thing like that, then you can install the reporting feature separately. You don't even need Reporting Services to use these reports! I found the download on Microsoft's website. Here is the link for more info on it:


I am ecstatic right now about these reports! :) Better late than never right?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top