Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

DL360 G3, CPU 1 Error LED

Status
Not open for further replies.

TMiller1972

IS-IT--Management
Jan 25, 2006
5
0
0
US
I've had this server in production for near 4 years.
It recently started completely locking up. No blue screen, just locked. When it happens, the "internal health LED" on the front turns amber, and the "CPU 1 Error LED" lights.

The problem is intermittant, and various diagnostics all show everything as fine. The only error that is logged is a basic "ASR Detected by System ROM". (automated system recovery).

My question is, would you think that this is nearly certain to be a CPU failure such that replacing the CPU would solve the problem? Or is that CPU error light not such a conclusive indicator? I can't see any other path, but I don't want to start dumping money chasing an unknown problem if I can help it.

Thanks for any insight!
 
Talking of insight, what does Insight Manager say? Does it log any issues? If you have got the Proliant Support Pack installed on the server, you can go into and check the status of the server, to see if it is logging any events.

If you don't have the Proliant Support pack, it may be worth installing it. You can get it from:


--------------------------------------
"Insert funny comment in here!"
--------------------------------------
 
I did apply a PSP which did definitely upgrade a number of item that I had missed in my manual ugrading of drivers & software. I didn't solve anything however

The System Management page that you referred to says all hardware is good. That's the problem... everything is fine, until everything locks up.

When things lock up, I get a single related post to the integrated management log. It appears to hit after we shut the power off and restart (the only option after it locks). It says: "ASR Detected by System ROM". That "automated system recovery" doesn't tell me anythig about what caused it... and I can't see anything else... other than the fact that this CPU 1 Error LED lights when this occurs also.

I don't know of any path to pursue with this other than to replace the CPU and cross my fingers. Wondering if anyone has any suggestions before I take that step.

Thanks!
 
Dual CPU's? If so, I'd swap the PPM's first, then the CPU's. If the CPU were bad, then the health light would be red...not sure about intermittent issues...you sure a fan's not bad? That could cause an amber light, and so would a bad power supply (with dual psu's). I have had a few that would light the amber light, and one actually locked up a few times like yours...turns out it was the power inverter board. The 7 and 6 in the part number (spare part number) are transversed, so it's wrong as it's printed.

Burt
 
You could try reseating first? Sometimes reseating the components can remove a piece of dust or dirt and resolve issues? It's a long shot.... but short of that you could run a single CPU and PPM to see if you can identify the rogue one?

--------------------------------------
"Insert funny comment in here!"
--------------------------------------
 
True---seen it a million times with RAM...good call, Lad...

Burt
 
I've seen a similar problem with a DL380 G4 (Dual Processor), running 2003 (Standard and Ent) - we didn't realise it was a hardware fault at first (reason for trying Enterprise), because even though the Blue Screen said 'Hardware Malfunction - contact vendor', HP refused it to be a hardware fault.
I know you're not getting Blue Screens, but that could simply be an OS issue - you don't say if you're running 2000 or 2003.

Anyway, in the end we had a knackered processor - backups killed it (uses more processor utilisation than normal use maybe?), as did running the HP Offline Diagnostics. It was a few years ago, so my memories are a little hazy, but I think we booted from Diags CD, it checked everything, and reported everything as 'fine', then Clicked the Advanced Tab, or ticked the Advanced check box (I can't remember exactly how you display the Advanced Diags), but scrolling up and down through the hardware would always kill it - froze like you describe.

However, hardware faults manifest themselves in so many different ways - you might have a shagged processor, but diags may run through fine, even displaying the Advanced Diags/Hardware might work for you...

Thanks, Mark
 
I have Windows 2003 Standard. So far I've not been able to run good offline diagnostics. But the server is currently locked and this server authenticates VPN connects, so I've got to go to work on a Sunday :(

If I can't get a targeted diagnostic to tell me something or initiate the problem, then I'm going to start replacing things.

This does seem to occur overnight most often. Could be when it's being backed up.

I'll let you know what happens.
 
Have you installed anything new on the server, recently, besides the PSP? If you have auto updates allowed on the OS or antivirus software, I've seen both of those cause conflicts. Just a thought.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top