Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extremely Low Page Life Expectancy 1

Status
Not open for further replies.

fuzzyocelot

Programmer
Jul 22, 2003
333
US
Hello all!

I’ve been Google’ing and reading up on SQL Server and memory since yesterday and am very much confused. Hopefully someone can help clarify a few things for me.

We have a production cluster (active/passive) running SQL Server 2000 Enterprise SP3a (vs 8.00.818) on Windows Server 2003 Enterprise SP2. There are 2 CPUs per node. We have three instances on the cluster. The main instance has over 140 databases. The other two have around 5 databases each.

The page life expectancy lately has been very low (less than 30 seconds) and is being sustained at that value. It rarely gets above 300 seconds. This is not good. It means that the pages are being removed from the buffer cache too frequently which reduces the value of the cache, right? This means that SQL Server has to perform more disk reads which can degrade performance. Did I get that right so far?

The buffer cache size is 2.51 GB and the hit rate is 99%. The procedure cache size is 6.05 MB and the hit rate is 92.6%.

The server has 8 GB of RAM per node. The max size of the virtual memory is 11.7 GB with 6.11 GB free (52% free). The committed virtual memory is around 5770 MB. I read that anything below 7000 is not good. Is that right?

Pages found in RAM is 100%. The read hit ratio is around 0%. We have AWE enabled and the /PAE and /3GB switches in the boot.ini file. We recently added the /3GB switch and now are wondering if that’s causing problems.

The average queue length for one of the disks lately has been around 2 or so. I’m not sure what that indicates.

There appear to be spikes in soft page faults and hardly any in hard page faults. The highest soft page fault spike is 6889 per second. There seem to be a couple of spikes in page reads around 6 per second. The page file is 4092 MB in size - 25 MB of that is used.

The min server memory for the main instance is 0 and the max is 3584 MB.
The min server memory for the second instance is 0 and the max is 1280 MB.
The min server memory for the third instance is 0 and the max is 1024 MB.

We’ve been seeing CPU utilization sustained between 70 and 100% for the last week or so. We’ve been able to track most of the CPU problem to a single report that someone’s been running. This report normally runs after 5 pm when no one is here because it usually takes over an hour to run. However, recently they’ve been running it during the day which apparently hogs resouces on the server. They’ve since stopped doing this at our request. However, we’re still continuing to see performance problems.

The low page life expectancy has me concerned. Could someone please tell me what I need to do or check next?

I really appreciate it!
Thanks in advance!!!
 
Correct, if the page life expectancy is low the disk will be going to the disk more often.

Using virtual memory is bad. Virtual memory is memory saved on disk. This memory is very slow as it runs at disk speed. Sounds to me like more RAM will be a good idea.

Disk queueing at 2 is fine. The basic rule of thumb is if it goes above 2* "the number of disks in the array" then you have a disk speed problem.

Have you updated your statistics recently? How about your index rebuild schedule?

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

My Blog
 
Thank you for replying, mrdenny.

Do you think part of the problem we're having is because we added the /3GB switch to the boot.ini file recently? Actually, we may have had this problem before but didn't realize it until recently because we're evaluating a new tool - Spotlight on SQL Server Enterprise. That's how we noticed the page life expectancy has been so low.

We're also migrating this cluster to a new cluster with SQL Server 2005. So adding more RAM to the current one might not be feasible at this point. I'll check the new one to see how much RAM it has. I'm hoping it's got more.

We have 140 databases spread out over 9 maintenance plans on this cluster. I'll review them because I wasn't the one who wrote the main ones. I'm not sure if they reorganize or rebuild the indexes. I'll also see if the statistics are updated on a regular basis.

Thanks!
 
Odds are the /3GB switch isn't effecting you any.

In Spotlight click on the Buffer Cache hit ratio, and then click on the Buffer Cache drill down.

This will tell you what objects are taking how much of the cache, and how much of the objects is in cache. For example on my production machine I've got one PK taking 34.38% of the cache, but only 9.15% of the PK is in cache (it's a clustered index on a 75 Gig table). If you have something like this you may want to consider changing it from a clustered index to a non-clustered index. If your cache is pretty evenly spread over a lot of objects, then it may simply be an issue of needing more RAM.

Some links you may find useful.

How SQL decides how much procedure cache and how much buffer cache to use.

Tech Brief I wrote about Spotlight on SQL Server Enterprise

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

My Blog
 
Awesome advice! Thank you, mrdenny! I will definitely look into it.

Thanks!! :)
 
No problem.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

My Blog
 
Here's an update for anyone experiencing the same type of problems. What's interesting is that the page life expectancy had been fine the last couple of weeks. The buffer and procedure cache hits have been great too.

However, occasionally the Disk Queue Length spikes for a few minutes at around 4 for one disk in particular. In addition, the Free System Page Table Entries has been around 4700. This is virtual memory and using virtual memory is bad because it's on disk and runs at disk speed, right? Just making sure I understand.

The statistics and index rebuilds are done weekly for everything.

So in summary, is it looking like we may need more RAM and maybe faster disk speeds? We're migrating to a new cluster which is on a faster SAN EVA. So hopefully that will help with the disk speed. However, I'm not sure why it's using so much virtual memory. Is that because we may not have enough RAM?

By the way, mrdenny, the Tech Brief you wrote on Spotlight is awesome and has helped a lot! :)

Thanks!
 
Yes, virtual memory is bad. It does run at disk speed as it's on the disk.

Sounds like more RAM would be a good thing for you. Any idea what's causing the disk to go nuts at 4? Could it be something else using the same physical disks?

Can you post a screenshot of the "Buffer Cache drill down" in Spotlight (or email it to me directly)?

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

My Blog
 
The database for a data warehouse is on the instance whose associated disk is going nuts. The vendor has been here looking into the problem and I believe they may have fixed it. They're going to do an upgrade anyway. They are also considering putting the warehouse on its own separate cluster.

Unfortunately, I had to discontinue using Spotlight on the current cluster because my connection would stop working every other day. I worked with Quest support and found out that because this cluster is in a different domain than the one I'm in, there would be problems with the connection. I can't connect to the server through the performance monitor on my machine either. So it won't work right most of the time.

That's okay because we're in the process of migrating to a new cluster that is on the current domain. That other domain is going away. So once the migration is complete and we're on the new domain, I should be able to use Spotlight again.

Now I'm wondering if we will need more RAM on the new cluster. Right now it has the same amount that the current one has which is 8 GB.

Thanks for your help anyway! I really appreciate it. :)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top