Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Intermittent System Crash Problem

Status
Not open for further replies.
May 8, 2002
12
US
I have a server that intermittently crashes for about 24 hours, and then stops. After it stops, it works fine for a variable period of time (sometimes as short as a day, another time it worked for several months) and then begins the cycle again.

I am running Windows 2000 Server SP3 fully patched. I have TrendMicro ServerProtect installed on this machine, so I'm fairly sure it's not a virus.

Concerning hardware:

Case: Antec 4U Rackmount (4U26ATX450)

PSU: Antec ATX 450W SmartPower (SL450)
Motherboard: Tyan Tiger MP (S2460)

Processor: One AMD Athlon MP 2000+ - Tyan Approved

RAM: 1GB DDR266 Registered ECC RAM (2 Kingston KVR266X72RC25/512 Sticks - Tyan Approved)

RAID: Promise SuperTrak SX6000

NIC: Intel PRO/1000 MT Dual Gigabit NIC

Hard Drives: 3xWD800JB (RAID 5); 2xWD400JB (RAID 1); 1xWD400JB (Hot Spare)

Video: ATI XPERT 128

CD-ROM: HP DVD-ROM Drive

Floppy: Sony Floppy Drive

There are no indications of the problem until it happens. When the lockup occurs, it is a complete lockup - the LEDs on the back of the RAID controller freeze and the computer doesn't respond at all.

There are no BSODs, DrWatson errors, Event Log errors, nothing. The only message I get is that "The system shutdown at xx:xx:xx was unexpected. The data is the error code." Useful.

The machine just crashed again as I was writing this. It was up for a total of maybe 10 minutes. It's not an overheating problem, the machine is cooled very well (the case has 4 101mm fans in it, and I have a massive PC Power & Cooling TurboCool on the PSU.)

Please help! :)
 
I should also mention that I've tried switching processor and RAM slots, no luck.
 
I'd check the raid controller, particularly the onboard battery.

When the machine freezes, what do you do to make it come out of the freeze? Is there any one action or a set of actions which *looks like* helpful?
 
Yeah.. I have to reboot about 50 thousand times to get the machine to POST.

Last night, I ran Memtest for a while, came up with nothing, but when I rebooted into windows the machine is working fine now..
 
Hehe..

I know these boards (S2460) have some problems in the boot logic that cause you to have to reboot it a lot, but I'll be damned if I can figure out why it keeps crashing like this.... the frequency is really odd too...

it seems almost cyclical, like it's every 2-4 weeks it happens again.
 
Have you tried process of elimination on the RAM yet? Most of the time I find that restarts are caused by RAM gone bad. If it isn't RAM then it will probably be a Power Supply issue. I had a S2460 and I think one of my DIMM slots was bad because I had to RMA RAM (haha) once every 6 months or so. Not sure if other boards have that problem. Also, I had to replace the PSU on that system. Those would be my thoughts on it. Let me know if this helps.

Burke
 
Yea I ran MemTest86 which is an in-depth memory tester, turned up no problems. I was thinking it might be a PSU problem, but it's a Antec 450W..... also, PSU's generally arn't on the fence - it either works or it doesn't, in my experience. How would this make it cycle so?
 
What kind of PSU did you change to?

Also, can anybody make a reccomendation for a replacement board?
 
Just a thought, but random freeze ups like you describe can be caused by the board shorting out to the case. One system drove me crazy with freeze ups; I did everything I could think of including installing a different OS but it still froze. Then I decided to take everthing out of the case and rebuild it. noticed while doing this that a couple screws were loose on the motherboard. I made sure that everthing was tight upon reassembly and the freeze ups stopped completely.
Just a small thing to look at before buying a new board.

If you're going through Hell...keep going... (Winston Churchill)
RocKeRFelLerZ
 
Well, I took all the components out of the case, and it seems that while I ordered Kensington RAM, somehow I ended up with Powmem *boggle*.

I am 99.99999% sure that this is my problem.

Guess this teaches me to double check serial and part numbers when I get equipment... although, the RAM did arrive in Kensington anti-static casing... *boggle again*
 
I would suspect the Power Supply ,bad capasitors on the mother board or both.
 
I agree with the PSU diagnosis. And yes, they can give you issues like you are experiencing. I've experienced it.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top