Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

DL380:DIMM's failure and server rebooting

Status
Not open for further replies.

demin33

ISP
Feb 19, 2004
1
UA
Hello,
i'm facing a problem which seems to be related to memory modules failure on HP Proliant DL380 G3 netserver. First of all, this problem results in server rebooting which usually happens in evening time. I tested the server with SmartStart CD and found out that the server can't pass total memory test. There are 2 tests: Noise test and Chache test. Status for both tests is Failed.

This is quotation from DiagTicket i got after testing:
"This test failed as a result of the ECC (error correcting code) reporting an error while the test was operating. While not a problem with the test itself, this indicates that there was an ECC error incident while the test was running. Check the IML for any ECC Threshold Passed events. The appropriate DIMM will be noted in the error message itself."

The test results point out to correctable errors in memory modules. But... i replaced old DIMMs with new ones and nothing changed. I did it twice with 2 different pairs of new DIMMs and result was the same. Finally i replaced the system board of the server with new one and ran the test from SmartStart CD over again. The result was quite different. The server passed Noise test and Chache test perfectly. But after a few hours the server rebooted again on its own. The test from SmartStart CD was run again on the server and it again failed Noise and Chache memory tests. Since then the situation is the same. The problem seems to be a hardware problem because it doesn't depend on the software installed.

What can possibly be the root of the problem? (I have 2 DL380 servers with same problem.)
 
Hi,


First of all you could set the ASR(automatic server recovery) to disabled so you can see the cause of the reboot.



 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top