Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Dell Servers, Lose a Drive..Lose your Raid?

Status
Not open for further replies.

Shux

MIS
Aug 13, 2002
5
KR
We've had this happen on 3 different servers so far and from Dell is saying this is the first they've heard of it. I tend not to believe that based on searching this forum for one. We have over 200 Dell servers in house and are very close to going back to HP/Compaq .. the only reason we switched in the first place was price..and that doesn't seem to that big of gap anymore.

What I'm asking is if you'd have a similar problem with Dell Power Edge servers post and let me know about it. What we see if when we have a Raid 5 setup we've lost 1 drive in the array .. it's doesn't fail out properly and then seems to crash the array making the server unbootable. The first time this happened we put in a new drive and nothing would rebuild so we called Dell for their onsite support. The next two times having been down this road before we called them right off the start...all paths leading to the same result. 1 Drive failed and it took the entire array out. Like I said if you've experienced anything like this I'd love to hear about it. Thx much
 
We've had a drive fail on about 10 occassions in Dell servers with a RAID5 array (with standard PERC controllers) and it's recovered properly each time. the only issue I had is one replacement drive we had was faulty which caused me a headache trying to build it into the array (I was assuming something else had happened and it couldn't possibly be the new drive causing the problem). On the phone to a Dell engineer though we diagnosed it was the drive and the replacement they sent built in fine.
 
I've had drives in an array fail properly - get the alert, stick a new one in and the rebuild was OK. Faulty batch of controller cards at yours? Are they all the same model?
 
We have had the exact same problem, especially with PE 4400s and PowerVault 220s.
Check your firmware on Perc 3Di/DC, and firmware on the 220s to ensure that they are the latest (1.8 for 3DC and E14 for the 220s) the previous versions have known "cascading disk failure" problems.
Dell also was _surprised_ at our RAID failures, which caused 24-48hr server down situations on several occasions.
If you look farther back in the Dell server forum, you'll see our posting.
We too are looking very closely at HPaq as our experience with their service has been miles ahead of Dell (who ever heard of paying more for "real" support, instead of paying by support availability?)
 
Poweredge 6300, Raid 5, and two of the three cheetah drives are supposedly bad, making the server unbootable.

Rebuilding the drives failed.

I've got 6 hard drives from two other Power Edge servers. Can I reformat and install them?

What other steps did you take before calling Microsoft.
 
Updated the firmaware on my Dell's perc3 to the newest firmware about a month ago and had drives go off line, 3 times in two weeks. Downgraded the firmware to the original, think it was 1.6, no problem since. Never had a problem with the older version, Dell emailed a warning about a problem with older firmare, and to flash to the newer version. If I do update I will use LsiLogic's version, which is more work. With Lsi's you have to update the Windows Os drivers for the card or else it blue screens after the flash.
 
Also upgrade the firmware on the drives themselves. Dell list the various firmwares for the hard drives on their website. We had to do that on a Server and now everything is fine but it took a while to do this. The SNAP tool that Dell provided greatly facilitated this issue. Otherwises it is a very manual process
 
Glad you had luck with drive firmware upgrades...

Have a non Dell with a u320 raid which caused me a 38 hour day. I used the Seagate Enterprise util. To be safe I choose to update the firmware one drive at a time. The first drive was the "hotspare" drive; the firmware updated, all drives were OK according to my backplane diag LEDs. Upon restart two different drives failed. Spent about two hours trying to resucitate the array, gave up, updated the firmware on the other drives and rebuit the array from scratch.
 
We had just 1 server with this problem, and after replacing the drive 3 times, we (Dell) also replaced the backplane and that seemed to solve the problem, since it haven't happend since then, and this was 2 months ago.

/D
 
We've had only 2 types of issues. One was a bad powervault. We lost 5 drives at one time once, we could force them back on and usually get back. Had lots of problems. Dell would replace this part or upgrade that firmware, but it was never 'fixed' until they replaced the entire unit. The only other problem I've had is we lost an entire RAID 1 array by losing 1 drive. Lost drive 2 and server crashed and wouldn't reboot. After a full rebuild, both drives checked out okay and, knock on wood, no problems since. Thank goodness it was a server on a farm so our users didn't have any problems and I could take my sweet time.

All in all, I'm happy with Dell. Had trouble with IBM and HP before, no one's perfect. Just gotta pick the one your most comfortable with.

Debi
 
Going down this path as we speak.
Poweredge 2500, 1 Raid Container, lost a disk (the disk went offline) and now everytime i put a new disk in the container rebuild fails after 25 minutes and the drive goes to READY, but flashes amber. Dell say rebuild the container...whats the point of RAID.

Kanen7
 
I managed to fix mine without doing disaster recovery.
For once in my life, all the Dell hoop jumping worked.
the tech had be run disk media checks on each disk in the raid set. one of the disks (not the failed drive) had a bad block, the verify media repaired the bad block and once i removed and inserted the original failed disk, the repair began....and finished !
 
Kanen7, I'm having the same problem on an old PowerEdge 6100. Can you please tell me where do you go to run the disk media check on each disk? Thanks in advance.
 
We've running a PE 4400 for 3+ years, RAID 5, has failed properly two times. Second time rebuilt, but at reboot seemed to sense an error. Got a good Dell tech - who identified it as a container problem. Robocopied all the data to a standby array, deleted and rebuilt the container and we have been good to go. At NO time were we down 100%. I have had nothing but good luck with Dell - actually moved away from HP/Compaq after numerous problems. I think we all occasionally get a problem that is hard to detect, regardless of the vendor. Backplane problems and controller firmware/driver issues can cause a lot of problems that are 'disguised' as something else.
 
I'm going through this right now and eacn Dell tech has a different method of trying to get the replacement drive back into the array.
I have a Dell PE2650 - 5 drives. 1 failed and we received the replacement but it will not become part of the array. I think I have intialized it 20 times by now through the PERC bios and through Dell's Array Manager.
I am going to try Kanen7's trick tomorrow morning.

Anyone know what the proper steps should be to get this drive back in the Array or have any recommendations as to what I should try?
 
For those interested - I finallay got this resolved after 5 days of trying. We have a PERC 3 controller and ended up uninstalling the Dell Array Manager Utility and installing an Adaptec/Dell 'Fast Utility'. This utility worked well and I was finally able to reconfigure the array and get the 5th drive back in, something I couldn't do with the PERC BIOS utility or Dell's Array Manager. What a pain.
 
I am currently having a similar problem with our RAID 5. We lost one drive and nothing worked anymore. The Tech just left after installing the new drive and he did not seem to know anything about activating the drive. I myself am not completely sure how to do it (all I know is that t is critical that I recover the data). With the new drive in, the new drive has a steady on light with the other two flashing. He stated Dell support told him that it is rebuilding and will take another 1/2 hour. However, there is no other indication. How do I mount the drive into the RAID Stack and how do I initiate and track the rebuild??? We are using the PERC 3 Di on a Win 2K operating system (which does not boot...)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top