Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Restart SNMP Service on DL320 G1 gives Memory Error?

Status
Not open for further replies.

ADB100

Technical User
Mar 25, 2003
2,399
0
36
GB
I have a Windows 2000 Server installation on a DL320 G1 server with an 800MHz P3, 2GB RAM (2 * 1024MB DIMMs), a 36.4GB SCSI disk and a 9.1GB SCSI disk. Every time I boot the server I get an error logged in Windows Event Viewer saying there was a memory error:

Event ID:1031
System Information Agent: Health: Correctable or Uncorrectable memory error detected. The memory module should be replaced.
Board or cartridge: '0' Module: '2' Spare part number: '' Module size: '1048576' System id: 'CPQ0826'
[SNMP TRAP: 6056 in CPQHLTH.MIB]

The same error is reported when the SNMP server is re-started (along with the HP Insight Agents that are dependant on it). I have replaced the memory and also tested it in another server and it is fine. I have also ran the bootable diagnostics disk and can find no errors.
Does anyone have any idea what this could be?
 
Have you run diags to eliminate the Motherboard as the fault? Could be a faulty DIMM slot that causes the issue, and you only get the error occuring when a particular memory address is accessed?

-----------------------------------------------------
"It's true, its damn true!"
-----------------------------------------------------
 
I have booted the server with only 1 DIMM installed, in all possible slots and the error moves so it doesn't stay with one slot. I have ran the boot-disk diagnostics and this doesn't report any errors, I even let it run for a whole day and still no errors....

Andy
 
Apart from that particular DIMM being faulty, have you installed the latest Compaq Health drivers?

-----------------------------------------------------
"It's true, its damn true!"
-----------------------------------------------------
 
The DIMM's are all OK, as I said I have tried the DIMMs in another Server and they are fine. The latest Compaq SSD 7.30 is installed. The thing that is really wierd is the server is running fine and no other error messages are seen, I know it looks like hardware but all the diagnostics I have done don't show anything.
One thing I haven't tried is doing a system erase and re-installing Windows 2000. I am tryng to avoid doing this as the work involved to rebuild everything would be a pain.

Andy
 
Ever find a solution to this problem? I'm seeing the same thing on my DL320. I've bought new memory, installed the latest Proliant Support Pack v 7.4 and no luck.

I'm going to try the new (v 7.4) firmware update CD, but haven't gotten that far yet.
 
My server still has this issue. I have tried replacing memory, CPU and even the motherboard but the problem persists. I have even had the server bluescreen a few times on boot up.
I have ran the Compaq/HP bootable diagnostic disk for a day with no problems, I have also booted from using a memtestx86 disk and no problems are reported....

Everything seems to point to memory but I am positive it is OK. I now think I may have 2 faulty motherboards as I can't think it can be anything else. Does anyone know if there were any known manufacturing faults with the original DL320?

I am reluctant to wipe the disk clean and re-install Windows as this is a Cisco CallManager test server and will be a pain to reinstall. However I do have a spare SCSI disk so I may try a fresh build of Windows 2000 in the next few days.

Andy
 
Wow, sounds like you've been through the ringer on this one. I tan the firmware upgrade and as it happens everything was already up to date. I too have seen the server bluescreen at startup. If you've swapped CPUs and motherboards I can't think of anything to look at. If it were a problem with Windows 2000 you'ld think lots of other people would be having the problem.
 
The weirdness continues.....

I swapped out the hard disk and rebuilt the server using SmartStart 5.50 and Windows 2000 Server (with a SP4 slip-streamed CD). I didn't select anything other than defaults, let the server install itself, installed IE6SP1, let Windows update itself and then installed the HP PSP 7.40. So the server should be running Windows 2000 Server SP4, IE6SP1 and have all the MS & HP updates.....

It now boots without any errors, doesn't blue-screen (occasionally) and doesn't generate the 'Correctable or Uncorrectable memory error detected' errors after the SNMP service starts....

I am going to compare all the system devices and drivers from this working setup and the problematic setup with Cisco CallManager......

Does anyone know of any system drivers etc that may cause problems like I am experiencing?

Thanks

Andy
 
Guess what.....

I put the original 36.4GB SCSI drive back into the server and took out the 9.1GB I just rebuilt it with and guess what? It booted without any errors...........

I checked through the drivers and there is a discrepency with 'ACPI Uniprocessor PC' drivers:

C:\WINNT\system32\ntkrnlpa.exe 5.00.2195.7035
C:\WINNT\system32\ntoskrnl.exe 5.00.2195.7035

The rebuilt server had 5.00.2195.7045 file versions for both files.

The video driver was also different; the origianl one has the Standard Windows ATI ones whilst the rebuilt one had the newer ATI drivers from the HP PSP 7.40.

Apart from that there are some other things but none that I would consider real problems.

Any ideas? I am sure if I shut the server down and leave it overnight it will report the errors again in the morning. I'll let you know :eek:(

Andy
 
I left the server powered off overnight and booted it up this morning. Half way through Windows 2000 booting it blue-screened....... Didn't quite catch the error. Anyway restarted it and it started OK, there were no Correctable or Uncorrectable memory error detected' errors after the SNMP service started....

It's just weird, this server will never make it past the test environment it is in. I think I will just have to live with the weirdness.................


Andy
 
Got the same message on a Proliant ML570. We replaced the systemboard last night and got this message when we boot up. Are you experiencing hang ups or Blue Screens? I wonder if this is just a false reading. I replace almost everything inside including memory and still get this error. I even replaced the Memory Module board so it can't be hardware anymore. Any suggestions is appreciated. Running out of option on this except to reinstall the OS.
 
Hi

I get blue-screens and occasional hangs at POST where the Compaq screen says something like 'internal failure call your hardware supplier'. That isn't the exact message but it's something similar.

I too have replaced everything (except maybe the PSU & the CD/Floppy). Currently though the error messages after the SNMP service starts have gone (see previous posts). This is weird as there is nothing different. I swapped out the HD, rebuilt the server from scratch and then replaced the HD with the previous O/S install......

I am just wondering if there are any settings in the Compaq/HP management software that is likely to cause hardware issues? What I mean is do the agents have access to low-level hardware resources that Windows doesn't?

Andy
 
OK, I booted the server up again this morning and was presented with the following screen after the Compaq POST Splashscreen:
The server feature board is not supported in this system
-system halted
in the bottom right hand corner flashing in RED it said:
COMPAQ INTERNAL USE ONLY
I powered it off and then back on and it started normally. Windows booted and after the CTRL, ALT Delete log on screen appeared it blue-screened and said:
*** Hardware malfunction

Call your hardware vendor for support

*** Sytem halted

It rebooted itself after a minute or so and during POST said:

A critical error occurred prior to this power up

The server then booted as normal and Windows is running. The Event Viewer Memory errors have now returned though.....

To me and everyone else this looks like hardware but I have replaced everything. I can only assume I have had 2 faulty motherboards, but this still doesn't explain the rebuild and it working?

Anyone?

Andy
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top