Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Compaq 3200 and failing drives

Status
Not open for further replies.

Vos

MIS
Jun 13, 2001
80
0
0
CA
We've got a weird one. We're running an ML 530 with a Smart Array3200 RAID controller running RAID 5 with 6 HDs. 5 HDs are part of the RAID 5 and we've got an online spare (that's saved our a$$e$ a few times throughout this ordeal).

Last week, we started having hard drive failures. Weird thing is, a HD will 'fail' (shows red in CIM and the red X light on the front of the drive is lit up). The RAID array then rebuilds itself using the online spare. But if we reboot the server, the failed drive is fine again and the RAID array rebuilds back onto that drive.

So far, we've replaced the entire disk subsystem (5HDs, backplane(drive cage), 3200 controller, daughterboard for the 3200 cache and the SCSI cable. We've upgraded the system BIOS and firmware on the controller, and installed a patch that Compaq recommended. And a few hours later, we got the same thing all over again.

I'm at my wits end! Any help would be immensely appreciated!

Vos[peace]
 
Hi Vos

Check with CIM exactly what sought of errors are occuring, this may help in the diagnosis of the problem.
Also consider changing the power supply or atleast check it for excess ripple on the 5 volts. I have seen intermittent read\write errors caused by this in the past. Are there any other invironmental factors which may be the cause - Such as dirty power or magnetic interference etc. Is the unit on a UPS and is it working OK. Check all the Fans in the Server for correct operation - Heat can do funny things.

Good Luck,

David
 
Thanks very much SPI200.

CIM doesn't yield us much info, just that the drive has failed. We've got dual power supplies in this server, and it's on UPS. I appreciate the tip about the power spikes, I'll investigate that possibility.

None of the other servers in the same rack, which are pulling power from the same source are having any problems, so I'm not sure if dirty power or magnetic interference could affect this server and not any others directly above it in the rack.

No errors in CIM about heat or faulty fans either. This one's just not giving us much to work with, is it?

Thanks again,
Vos.
 
Did you just replace the cage, or did you replace the card that goes with it. I had a similar problem on one of our 530's, and it turned out to be the card behind the drive cage (i'm not describing it well, but there is the physical box that the drives live in, and there is a board that interfaces with it. It's a separate part.)Keep calling Compaq, eventually, you'll get someone who has seen this before, or get someone who will send you a new server. Good luck-Steve
 
Beaghler:

Thanks for the reply. When we replaced the drive cage/backplane, it was all one component. The back of the cage looked like it may have already had a card attached flat to the back of it and all that was needed was to plug in the scsi cable and the power. The back of the drive cage was a kind of black paper or cardboard, I assume that was actually a part protecting the back of the card attached to the drive cage.

If I'm wrong, please let me know, and thanks again for the input.

Vos [peace]
 
O.K. It sounds like you got the whole thing on the drive cage. Here's a real long shot- Try updating the firmware on the drives. The latest is either on the SmartStart CD ver 5.5 or on the web site. It's a 5 disk set, but you'll only need one. The problem is you don't really know which one until you actually go to flash it. It'll tell you which disk it wants. You do it like a ROM flash. If you didn't do the 5.5 update on the controller, do that too.

Just for the sake of covering all bases, apply all the other patches- Management agents, NIC agents, etc for whatever OS you are using.

Keep on Compaq about this. If necessary, get ahold of the local SE and get him involved in helping you. If nothing else, he should be able to streamline a replacement for you. Good luck. Steve.
 
You replaced all the drives? Did you reload the OS? Do the array members match? (IE. all seagate cheetah st3-whatevers). You wrote that "a drive will 'fail'" is it always the same SCSI ID or the same physical disk?
 
Thanks for the help guys, the local Compaq sys engineer seems to have resolved it...sorta...we hope...

Compaq has a patch (SP16373) that resolves some issues with drives showing predictive failure, but we were seeing drives fail, not predictive failures.

The firmware we had on the drives was up to date, but we upgraded all of the mgmt agents and array driver from the Smartstart 5.5 download. Then we re-ran this patch. A few hours later, another drive failed. We've had new drives in 4 of the 6 slots fail over this time. Anyway, once we replaced that drive again, we re-ran the patch and we haven't had any problems in a few days.

The Compaq sys eng, our hardware installer and I have never quite seen anything like it. But it looks to be fixed... I hope...

Thanks for all your help,
Vos [peace]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top