Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Dell PowerEdge 2650 REBOOTS AT WILL 3

Status
Not open for further replies.

oakuniv

Technical User
Sep 21, 2002
15
0
0
US
Hi - I recently purchased 2 Dell PowerEdge 2650 servers.
They have 2 dual 2.4 GHZ processors, 2 GB of RAM, 5 36 GB 15K RPM Harddrives in a RAID 5 configuration.

It is running as an Apache Webserver - running an application called WebCT. We also use a software product called DoubleTake to mirror our install of WebCT to another identical server. I use the Gigabit network card with a cross over cable to send the double take data to the other server.

Anyway - here is the problem. The server reboots itself at random times. 2 to 3 times per week. The last software I installed before the reboot problem appeared was Windows Service Pack 3. I have since uninstalled service pack 3 but the reboot problem continues.

Using Dell Openmanage software I have run diagnostics and found no problems. I have also taken the server down and re-seeded all the components. But it still re-boots.

The only error that I get once the server has rebooted is that the server restarted unexpectedly. I haven't used the Failover capabilities of the DoubleTake software yet - because I believe the other server may have the same problem.

The first time these servers rebooted - one of our other server admins tried to access each of the machines with terminal services. When he did that each of the machines rebooted as he tried to connect to them. This was several days after we had already gone into production with these machines.

At one point I was rebooting the server myself and I did get a blue screen but I wasn't able to read it fast enough before the system re-booted. I know there is an option to not allow the system to re-boot so that it just hangs on the blue screen. . .and that might be the last thing we try. . .I can't afford to have the systems down for long and it is so hard to know when they might re-boot.

Any help would be appreciated. Thanks in advance.
 
hmm, thats interesting, that means it's not the system having a blue screen that is causing the reboot... I would definately look into the possibility of power related issues...

Maybe bring an alarm clock to work that doesn't have any kind of battery backup, maybe it's some kind of power bump and you UPS isn't compensating for it... Plug ni the clock and see when it resets to 12:00....

As technome suggested, I have also had a few older APC 2200's that have exhibited the same problem, system reports full charge and batterys fine, but won't hold a charge to save it's life...

Best suggestion....

-Mike
 
As a note to MJewell's post, the APC units I have are the APC 2200 model.
Sure wish I had a low cost (dreaming) software interfaced line power monitor or better still a monitoring system for the low voltage outputs of power supplies.
 
technome, are yours the rack or upright models? mine are the upright and are several years old...

I inherited them when I started here and didn't discover there was a problem till the power went out one day...

-Mike
 
Uprights, which are a few years old..
My batteries ae holding the charge but I was wondering if the shut down circuitry was OK. Probably nothing wrong. During the summer power outages the batteries were drained and we had multiple voltage cutouts in a few hours. The nasty part was I had two servers offline drives in their arrays, at the same time at one site, twice; never had that happen before. Since then no problems. As a note, I have seen a few power supplies/motherboards be very sensitive to slight power flucuations.
Was just reading there has been a rash of motherboards manufactured with poor (cheap) electrolytic capacitors showing up which expand/leak. Seems to have hit many of the manufacturers.
 
We may finally have a fix for Dell 2550 random server reboots that may also help on the 2650 if you are using the Broadcom Gigabit 57XX Ethernet adapter. We have 4 Dell 2550 servers and only one has been rebooting - maybe 2-3 times per week and this has been going on for almost a year. Have swapped power supplies/power source between working one and not working. Gave up on problem at one point. Have applied all updates (on W2kSP4) could find and server still reboots at will, but usually between 8-10 a.m. and 1-3 p.m. Just recently figured out the one that reboots is using the Broadcom gigabit adpater and the other 3 are using the Intel 10/100. Our driver version for the Broadcom was version 2.6 dated 5-02 and doing updates from Dell or Windows it never has said this needed updated. I searched Broadcom.com and found the current version for that adpater is 7.35 dated 12-03. HP/Compaq uses this adapter and they have posted the driver updates - I think Dell fell asleep here. I have downloaded version 7.35 from Broadcom and I am only on day 1, but this would be something that seems like a logical fix versus everything else. I'll post any update if the problem comes back - it has rebooted 3 of last 4 days so I should know soon.
 
So our Dell 2550 server rebooted with the updated Broadcom Gigabit driver. Now I have switched over to the Intel 10/100. I actually was standing in front of the server when it rebooted this morning at 8:34 a.m. and it looked like an Internet connection properties screen or wizard flashed for a second and then rebooted. Nothing in event viewer other than "previous shutdown unexpected" - as always. Nothing in temp folders out of the ordinary. Made sure virus scan current and ran full scan. Server has 24 HP printers configured with standard tcp/ip port. Print spool location is on a drive by itself. Provides DHCP and DNS for network also. No backup software runs "from" this server. Actually seems like latest round of Windows updates has increased frequency of reboots.
 
This is a long shot...
We had a similar problem, but we were crashing. We were running NAV 7.5 on the server. I was able to track down the crash to NAVAP, this is NAV Auto-Protect. Turns out there is a known (but obscure) bug with NAV 7.5 and Terminal Services, even if it's only "Admin" terminal services. I upgraded the server to NAV 7.6, and it went away.

Strange thing is that the server had been running fine with this same environment for a long time...I have no idea what suddenly caused it to become unhappy.
 
Our Dell server 2550 rebooted again (at 10:45 a.m. yesterday) after switching to the Intel 10/100 card. More info - I previously completely removed Open Manage to eliminate that and yesterday installed version 3.6 hoping it will provide better detail on what causes the system to reboot - hasn't rebooted yet today. I also had completly removed McAfee AV and disabled Terminal Services. BIOS is A08. System Defrag states disk doesn't need defragmented - 30GB available on system partition. One thing I am researching that both versions of Open Manage do on this server but not the others is under Main Chassis where it should list all system components (fans, power supply...) it doesn't display any components. I don't use Open Manage a lot and can't find any hits on this issue so I don't if not listing any components there is a big problem.
 
Personally when I receive a server from a manufacturer I wipe the setup out by fdisk and create a very basic setup without the extra manufacturer created partition and software. If raid equipped I will load the raid managemnet software. I find the servers are more stable, and less software to cause problems.
 
I had this EXACT same issue with two clustered 2550's, tried everything in the book. Dell came in and replaced the entire contents of each machine besides disks ... problem solved.

01110000
 

We have 3 Dell PowerEdge 2500 servers running Win2K Server. Hardware and software were virtually identical on all three. 2 of these servers were spontaneously rebooting - typically on weekdays between 5 am and 8 am. The 2 servers would reboot in tandem - sometimes the file server would reboot first and sometimes the mail server would reboot first - but if one rebooted, then they both did - always - and typically within 1 or 2 seconds of each other. The only event logged was "The previous system shutdown at <time> was unexpected." We scoured for viruses, turned off antivirus, shutdown various services, tracked when users were logging on thinking it was a command, virus or malicious code coming from one particular machine, etc. Nothing would resolve the spontaneous reboots - and they would occur randomly - sometimes once a week, sometimes twice a week, sometimes weeks without a spontaneous reboot.

After diagnosing and months of hair pulling, no answers were forthcoming and the spontaneous reboots began corrupting our Exchange database. The cleanup would wipeout the Blackberry entries in each Blackberry user's mailbox - and we'd have to remove the Blackberry user, then run a utility to clean up their mailbox, then recreate the user and then do enterprise activation again. What a nightmare. And of course the worst of it would occur when I was out of the office or was taking a rare vacation day. Arrrrrgggghh.

Monday, I was standing in front of the three servers - with all of the displays turned on. Wham - the file server and mail server just went dead - flat black - as if they weren't plugged into UPSs and someone just ynked the powercord out. The third server was wholly uneffected. The light did not flicker. Just wham - servers down and restarting.

It finally occurred to us that the 2 servers that were spontaneously rebooting were on the same electrical circuit. The third server (that wasn't rebooting) was on a different circuit - though in the same room. Two days ago, I purchased a new APC 1000 UPS and physically moved the Exchange server onto a different floor on the other side of the building. This morning, the spontaneous reboot struck again - BUT ONLY on the file server that was not moved. For the first time, the spontaneous reboot effected only one server.

This morning I installed line conditioners (power conditioners) between the outlets and the UPSs. We have had problems with fluctuations in the building's power in the past. I suspect that there is some large electrical surge (such as AC turning on early each morning) and that sometimes the large draw of electricity is not enough to kick on the UPS, but IS large enough to starve the servers of electricty to the point of shutting down.

Anyway - thought I'd share my experience with the forum in the hopes that someone might save their sanity.


"When everyone decides, then no one decides.
 
Poor power supplies are prone to this, either the capacitors are undersized, enough not to supply momentary power during a power dip , or the filtering is not sufficient to smooth out very small power anmomalies. Motherboards have circuitry which will reboot a machine should power flucuation go above or below certain limits..does not take much to trigger the circuits.

Cplbaum2
Odds are those circuits are on two different phase legs of the incoming power. Power companies, do power switching primarily after 2 am, looks like they continue later in your area.
On some APC units, if the battery gets old, the transfer does not take place properly

........................................
Chernobyl disaster..a must see pictorial
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top