Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

CPU pegged at 100%, looking for reasons... 2

Status
Not open for further replies.

wahnula

Technical User
Jun 26, 2005
4,158
0
0
US
Hello,

This is a re-post with less verbiage and more images. One of the CPU's in my dual-Opteron server has suddenly shown 100% use in Task Manager:


While the System Idle Process is at 98-99% indicating low system use, but still 50% CPU use:


I did run a complete malware scan at HouseCall, nothing. I will even listen to any wild guesses as to why this is happening.

Rest of the hardware:
Antec 550 watt PSU
Asus K8N-DL mainboard
(2) Opteron 1.6 GHz single-core
Onboard nForce4 RAID 1 for OS (a little wonky lately)
3Ware Escalade for RAID 5 for data
All HDDs WD Raptors
4GB ECC RAM
OS: MS SBS2003SP1

Thanks as always.

Tony

Users helping Users...
 
Not got any bright ideas yet for you Tony, but what happens if you boot in Safe mode?

ROGER - G0AOZ.
 
Thanks Roger,

It's a working server, so I need to keep re-boots to a minimum during work hours. I can try that after work to see if it makes a difference.

Tony

Users helping Users...
 
Tony,

Perhaps adding CPU Time to the Task Manager will reveal something? (View, Select Columns...)

Dell
 
Dell,

Thanks for the reply. I could not follow adding CPU time to Task Manager; under View, I have:

Refresh now;
Update Speed: slow, fast
CPU History: One Graph per CPU, etc
Show Kernel times.

No "Select Columns"

Process Explorer came up with some new information:

System Idle Process was down to 48-50%
Hardware Interrupts were 35-40%
Deferred Procedure Calls were ~15%

All this while Task Manager still shows SIP at 98-99%

What does this all mean?

Tony

Users helping Users...
 
Hi Tony,

You need to be on the processes tab in Task Manager to be able to "Select Columns".

Cheers.
 
In the processes tab, click twice on the CPU column.

That will show you which process is using the cycles.



Just my 2¢

"What the captain doesn't realize is that we've secretly replaced his Dilithium Crystals with new Folger's Crystals."

--Greg
 
SBS 2003 SP1 is presenting a different Task Manager view than I see under XP, 2003 and 2000 servers. I also don't see Hardware Interrupts in Process Explorer on my XP box, but that simply could be my unfamiliarity with the application.

35-40% hardware interrupt handling time (at least, that is my interpretation) seems excessively high. Almost half the time spent servicing a piece of hardware seems wrong.

Process Explorer's Help seems to agree:

Interrupts and DPCs
On Windows NT-based systems Process Explorer shows two artificial processes: Interrupts and DPCs. These processes reflect the amount of time the system spends servicing hardware interrupts and Deferred Procedure Calls (DPCs), respectively. High CPU consumption by these activities can indicate a hardware problem or device driver bug. To see the total number of interrupts and DPCs executed since the system booted add the Context Switch column. Another sometimes useful metric is the number of interrupts and DPCs generated per refresh interval, which you see when you add the CSwitches Delta column.

I found the Context Switch and Context Switch Delta column selections under View, Select Columns, Process Performance tab.
 
Thanks, cmeagan656 for pointing out having to be on Processes tab, that explains why Tony was seeing other options than I.

And Greg is correct. Clicking on column headers sort by that column, ascending/descending.
 
Thanks everyone...CPU time did not show anything out of the ordinary, print spool service was #2 after SIP. I needed to quit my RDP session lest I interfere with backup so I'll be back in the morning to play with Process Explorer a bit more.

I've been thinking hardware myself as we had a power outage last Wednesday and, even though I have a surge/UPS and should have had a soft shutdown, there was the nasty "Why did Windows suddenly shut down?" dialog upon reboot Thursday morning. The RAID 1 array was also degraded. I rebuilt it and rebooted, that's when I got my first "low SIP" warning.

I wonder, should I swap the CPUs and see if the problem follows the CPU?

I might think it was the RAID 1 driver except I reloaded SBS and supplied a fresh driver before recovering from backup last Thursday night.

Monday night (yesterday) the server locked up (no response to CTRL+ALT+DEL) but still worked as a file & Exchange server, it shut itself off Tuesday morning. It started up w/ healthy RAID 1 array and "Why did Windows...?" message, so something's up. Thanks again folks.


Tony

Users helping Users...
 
Well, at least I got a good backup last night. Playing with Process Explorer today and further research on Hardware Interrupts only led to more depression. Hardware Interrupt context switches are in the billions with hundreds of thousands in the delta column. Research on this points to many things, but no solutions that I have encountered unless your IDE has somehow shifted to PIO instead of DMA mode. Mine has not.

I'm going to let it run for a while (no choice, I need to get back to my REAL job) and hope for some Tek-Tips pearls of wisdom.

Tony

Users helping Users...
 
UPDATE

Well...my IDE had not shifted to PIO mode...but my SATA controller had. I was systematically going through all the choices in Device Manager and there it was, PIO on my second SATA controller. It had an underlined sentence like "Your device is in a degraded state" that, when clicked on, presented a message that the SATA controller was switched from SATA 1.5 to PIO by "my computer" (BIOS, Windows???) because a cabling inconsistency or problem had been detected. That probably happened when I booted the PC to one RAID 1 drive, then the other, so I could identify them before replacement of "drive 2".

I changed it from PIO to SATA 1.5, and received a message that Windows needed to reboot to make the change. In my systematic manner, I put that off until after all the backups run tonight and I have a fresh head in the morning. Don't feel like spending another night in the office in case a wheel flies off after reboot...[smile]

I will update after the deed is done.

Tony

Users helping Users...
 
Tony, see if there is a BIOS update for the BOARD...

You could also try to disable the ACPI in the BIOS to see if that changes anything...

Ben

"If it works don't fix it! If it doesn't use a sledgehammer..."
 
Ben,

Thanks but I'm on the fence as far as BIOS upgrades go...I'm not running the original, I'm running 1009 or similar. It was during this upgrade that the power went out, the fear of PC modders everywhere...and it really happened to me. Luckily the update was far enough along that the battery held while flashing finished, then the UPS automatically shut the PC down safely...true story, so, especially when I'm playing with my server, I forgo BIOS updates unless they offer some real, usable benefit. The only benefit to 1009 (or whatever it was) was 64-bit support, which I have yet to use. The same is true of SP2, I have SP1 on CD so I can recover to that point if needed (part of SBS restore procedure). SP2 was more a roll-up of little updates that I can totally do without, that's also why I have Automatic Updates switched OFF. I check periodically for significant updates and I don't consider IE7 "critical" so I sit where I am at SP1 level.

Sort of the old-school "If it works don't fix it! "[smile] attitude. I did try disabling ACPI (earlier) in the BIOS, no change.

Anyway I think this one is solved, pending reboot this morning. I just checked the Backup Report via OWA and all the backups ran successfully last night. As soon as I get in in a few hours I'll do the deed, now that I have a full complement of backups at my disposal in the event of the worst, which I've grown to expect.

Tony

Users helping Users...
 
UPDATE UPDATE UPDATE

I changed the choice from PIO to SATA 1-1.5 and rebooted. Same situation, back to PIO. I changed it again and this time added the check-box "Let BIOS select transfer mode"...and that did it. The overloaded Task Manager was gone, I had some Exchange issues that were resolved by playing with the logs a bit, and we are back in business.

Special thanks to Freestone for showing me the way to use Process Explorer, that excerpt from Help got me to see the Hardware Interrupts and Google helped mew find out about the PIO problem from there.

Thanks also to cmeagan656 for helping me around Task Manager, and thanks Greg too, although I did already know about sorting columns by clicking at the top[smile]

All in all a team effort, thanks again to everyone.


Tony

Users helping Users...
 
Tony, glad you found the problem, and fixed it...

Ben

"If it works don't fix it! If it doesn't use a sledgehammer..."
 
I'm very glad to see you got the problem resolved, Tony! Now you can rest easier. This thread is one more proof of your signature tag line :)

Dell
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top