SQL SERVER ISSUE!!!! - CPU USAGE MASSIVE!!! 1

sanjdhiman · Sep 13, 2007

Hi

I hope you can help.

My sql server 2005 (CLustered, active active) running on an MSA disk array has been working fine, all of a sudden the simplest of queries, i.e. SELECT COUNT(1) FROM tableA spikes the CPUs (there are 8 in there). With 8Gig of Ram to use.

I have checked execution plan, and its doing an Index Scan, time taken to run queries is pretty normal but the spike to the CPU is a big concern.

If I have a lot of things running, then the requests that I give the server (i.e. from applications) start to time out...

Its so weird.

We re-organised the indexes, and updates stats with full scan on the tables.

I checked sysindexes and the looked at the modctr.. and it was a resonable number nothing too high.

Can you think why the simplest of queries would SPIKE the cPU like this?

Have you ever seen this before!

We failed over the box to the other node and ran the query, still the same, so I am guess the data is an issue

Please help

Thanks in advance

Sanj

ptheriault · Sep 13, 2007

I had a similar problem, It turned out to be bad memory.

Here are some very helpful queries for 2005 that will help you find the bottleneck.

Code:

--TOP 50 highest CPU queries
select  
    highest_cpu_queries.plan_handle,  
    highest_cpu_queries.total_worker_time, 
    q.dbid, 
    q.objectid, 
    q.number, 
    q.encrypted, 
    q.[text] 
from  
    (select top 50  
        qs.plan_handle,  
        qs.total_worker_time 
    from  
        sys.dm_exec_query_stats qs 
    order by qs.total_worker_time desc) as highest_cpu_queries 
    cross apply sys.dm_exec_sql_text(plan_handle) as q 
order by highest_cpu_queries.total_worker_time desc


--queries running in parallel
select  
    r.session_id, 
    r.request_id, 
    max(isnull(exec_context_id, 0)) as number_of_workers, 
    r.sql_handle, 
    r.statement_start_offset, 
    r.statement_end_offset, 
    r.plan_handle 
from  
    sys.dm_exec_requests r 
    join sys.dm_os_tasks t on r.session_id = t.session_id 
    join sys.dm_exec_sessions s on r.session_id = s.session_id 
where  
    s.is_user_process = 0x1 
group by  
    r.session_id, r.request_id,  
    r.sql_handle, r.plan_handle,  
    r.statement_start_offset, r.statement_end_offset 
having max(isnull(exec_context_id, 0)) > 0

-- 
-- Find query plans that may run in parallel 
-- 
select  
    p.*,  
    q.*, 
    cp.plan_handle 
from  
    sys.dm_exec_cached_plans cp 
    cross apply sys.dm_exec_query_plan(cp.plan_handle) p 
    cross apply sys.dm_exec_sql_text(cp.plan_handle) as q 
where  
    cp.cacheobjtype = 'Compiled Plan' and 
    p.query_plan.value('declare namespace  
p="[URL unfurl="true"]http://schemas.microsoft.com/sqlserver/2004/07/showplan";[/URL] 
        max(//p:RelOp/@Parallel)', 'float') > 0


--Will show the highest wait type queries are waiting for
SELECT *
FROM sys.dm_os_wait_stats
ORDER BY waiting_tasks_count DESC

--Use this to find wait type of susspended spids
SELECT *
FROM master.dbo.sysprocesses
WHERE spid = 148

--will show SQL statement of a SPID
declare @handle binary(20)
select @handle = sql_handle 
from master.dbo.sysprocesses where spid = 148
select * from ::fn_get_sql(@handle)


--If you are seeing lots of SOS_SCHEDULER_YIELD in 
--your Wait States, that is a very stong indicator of CPU pressure.
--You can run the DMV query to confirm that:

-- Check SQL Server Schedulers to see if they are waiting on CPU

SELECT scheduler_id, current_tasks_count, runnable_tasks_count
FROM sys.dm_os_schedulers
WHERE scheduler_id < 255

--If you see the runnable tasks count above zero, 
--that is cause for concern, and if you see it in double digits 
--for any length of time, that is cause for extreme concern!



--This query will give you an idea of 
--how many tasks are waiting in the system. 
--You can use this information to understand 
--blocking characteristics of your load
 
select count(*)
from sys.dm_os_waiting_tasks
where wait_type <> 'THREADPOOL'

sp_helptext 'sys.dm_os_waiting_tasks'

select wait_type, count (*)
from sys.dm_os_waiting_tasks
group by wait_type
order by count (*) desc

--CPU BottleNeck

select  
    scheduler_id, 
    current_tasks_count, 
    runnable_tasks_count 
from  sys.dm_os_schedulers 
where scheduler_id < 255

--The following query gives you a high-level view of which 
--currently cached batches or procedures are using the most CPU
--The query aggregates the CPU consumed by all statements with the same
--plan__handle (meaning that they are part of the same batch or procedure).
--If a given plan_handle has more than one statement, you may have to drill in
--further to find
--the specific query that is the largest contributor to the overall CPU usage 

select top 50  
    sum(qs.total_worker_time) as total_cpu_time,  
    sum(qs.execution_count) as total_execution_count, 
    count(*) as  number_of_statements,  
    qs.plan_handle  
from  
    sys.dm_exec_query_stats qs 
group by qs.plan_handle 
order by sum(qs.total_worker_time) desc

--To find more details about plan run this query
-- takes either spid or plan_handle
SELECT *
FROM sys.dm_exec_query_plan(0x06000700185B93054043ED7F010000000000000000000000)

--   4. Q. Does my load have an active resource bottleneck?
--You can answer this question by looking at the resource address 
--that tasks are blocked on.  
--Keep in mind that not all wait types have resource associated with them. 

select resource_address, count (*)
from sys.dm_os_waiting_tasks
WHERE resource_address <> 0
group by resource_address
order by count (*) desc

--
--sys.dm_os_schedulers
--1. Q. Do I need to by more CPUs?
--In order to answer this question you have to find out if your 
--load is really CPU bounded.  Your load is really CPU bounded if a 
--number of runnable tasks per each scheduler always greater than 1 
--and all of your queries have correct plan.  
--The latter statement is very important, your load can be CPU bounded 
--due to the fact that somehow optimizer generated bad plan – 
--it can happen if your statistics out of date or you tried to 
--perform handcrafted optimization. In this case you don’t want to run to 
--Circuit City to buy more CPUs right a way – you want to fix the plan. 
--Here is the query to find out average length of a runable queue on the system:

select AVG (runnable_tasks_count)
from sys.dm_os_schedulers
where status = 'VISIBLE ONLINE'

--1. Q. How many sockets does my machine have?

select cpu_count/hyperthread_ratio AS sockets
from sys.dm_os_sys_info


--Q. How many either cores or logical CPU share the same socket?

select hyperthread_ratio AS cores_or_logical_cpus_per_socket
from sys.dm_os_sys_info
 
--4. Q. How much physical memory my machine has?

select physical_memory_in_bytes/1024 AS physical_memory_in_kb
from sys.dm_os_sys_info



select os.task_address, os.state, os.last_wait_type, clr.state, 
clr.forced_yield_count 
from sys.dm_os_workers os join sys.dm_clr_tasks clr 
   on (os.task_address = clr.sos_task_address) 
where clr.type = 'E_TYPE_USER'

- Paul
- If at first you don't succeed, find out if the loser gets anything.

sanjdhiman · Sep 13, 2007

hi ptheriault

thanks for the code, I will check it out/

Do you know why it happens?

Its such a weird thing

thanks for the code I will try and see if it helps

ptheriault · Sep 13, 2007

Well when I had the problem we had 64 gb of mem in the server. I had set sql to take 60 gb of that. However, when I looked at mem usage counters it never got above 48. I would get all kinds of waiting tasks because of CPU bottleneck. At first I thought I didn't have enough CPU or that parallelism was hurting performance. (which it can in an OLTP db) So I installed a hot fix from SQL that was supposed to help with the parallelism but when I rebooted the server it didn't come back. That's when we found the bad memory. We took out all 64gb and put in 16 gb of new memory and the server started to perform above expectations.
You may not have a memory problem, but it's a place to start along with the queries I sent you.

- Paul
- If at first you don't succeed, find out if the loser gets anything.

PEOMAF · Oct 24, 2007

I'm having the same problem with massive CPU utilization that seems to be inconsistent with what is currently running on the server but I'm in a SQL Server 2000 environment. I'm also in an active/active cluster, and have 8 processors.

I'm wondering if you have a similar set of queries that can be used in an SQL 2000 environment to track the same things.

Thanks.

-Mark

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

SQL SERVER ISSUE!!!! - CPU USAGE MASSIVE!!! 1

sanjdhiman

Programmer

ptheriault

IS-IT--Management

sanjdhiman

Programmer

ptheriault

IS-IT--Management

PEOMAF

Programmer

Similar threads

Part and Inventory Search

Sponsor