Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to diagnose bad smart array 5i? 1

Status
Not open for further replies.

johnzbesko

Technical User
Aug 15, 2007
15
US
I recently acquired a surplus Proliant DL380 G2. It has a Lights Out board and six hard drives. I installed Ubuntu. Then, I added a gigabit PCI ethernet card. Upon boot, the smart drive array did not seem to be recognized, ie. the smart drive did not initialize and the system would not boot.

I read manuals, searched online, tried the HP Diagnostic boot CD, etc. and tried different motherboard switch settings. At some point, the smart drive decided to work and I was able to boot. I installed Fedora 6 and the server worked fine.

Then, I tried to change the resolution of xwindows but the screen hung (the server was still working, eg. firewall, file server, web server, etc.) So, I did a cold boot- shut off the server using the power button.

Now the server will not boot again because it doesn't initialize the smart array. What is going on?!

Any suggestions for this frustrating problem are greatly appreciated.
 
First off, did it boot up by chance, or was it something you did? Did you use Smart Start with all the HP drivers and tools that go with it, such as the HP Array Configuration Utility? Does the Smart Array Controller 5i NOT initialize or show any messages during POST? Have you tried f10 for diags, or does it not get that far (during POST)? Do you have a cache battery for the 5i controller?

Burt
 
Umm, I believe it booted up by chance- I don't know what I might have done to fix it. I have updated the firmware and also booted the diagnostic CD- which also did not detect the smart array. However, the BIOS does show an IRQ for a RAID controller. So I am confused and frustrated.

I partitioned the first "bank" of drives (18GB) for Fedora 6- /boot, /, /var, and /tmp. The second (36GB) is /home; the third (18GB, but will be replaced with 36GB eventually) is /ext. The first bank shows the little green lights; the other two banks are dark.

I'm concerned that some part of the two smart array boards is broken, but buying part-after-part on ebay, etc. seems like a very tedious and expensive way to try to fix the problem.

Again, any suggestions are greatly appreciated.
 
Also, no, I do not have a battery cache and the smart array does not initialize during POST. When it did, the server was able to boot and I was able to detect the array and partition it with the HP diagnostic CD.

I also tried removing the Light Out PCI card, but that didn't help.
 
Have you tried reseating it? I would suspect a bad 5i DIMM.

Burt
 
I've tried reseating everything, short of taking the unit apart. I reseated the cables from the array backplane to the motherboard; I reseated the array controller card onto the motherboard (the piece underneath the PCI riser); and I have reseated the PCI riser numerous times as I tried a different slot for the Lights out board, the gigabit ethernet card, etc.

I'll have to consult the manual PDF for the location of the 5i DIMM- it sounds like that would be the easiest thing to replace.
 
Sorry, not a DIMM---was thinking of later generation dl380's...
I mean the controller card itself.

Burt
 
Well, I suppose I could bite the bullet and get a replacement. I hope it's not the backplane. I wish I could further test this before spending big money.
 
Can anyone suggest other options, such as a PCI RAID card or some other scsi controller so I could at least use the hard drives and attempt software RAID?
 
We had several of the 5i controllers in our servers go bad. It would not suprise me at all that you are experiencing the same problem. However, I believe you still need to have the controller installed (whether good or bad) for the server to boot. If I remember correctly, I think we had replaced ours with Smart Array 431 controllers, but I'm not positive.
 
Thanks for your suggestion. A quick google search seems to show they are cheaper than the 5i. How does one connect an alternate scsi controller? Is there a single cable I would use to connect the backplane of the drives to the alternate? Or is it more complicated than that?
 
Yeah, the hardware setup is pretty straight forward. I think the 431 is a single channel controller, so if your looking for something a little more robust you may want to consider a different controller.

The main thing, with any multiple controllered system, is to go into the setup and specify which controller you want as the boot controller....which has the drives you have your OS installed on.
 
So, on an impulse, I started up the machine--and it worked! The smart array initialized and the server booted. All is well. Which begs the question- how reliable will this be? Do I have a problem with heat? I have installed the HP linux drivers/software, but I did have a problem with acpi.

Any suggestions on monitoring the "health" of this server?
 
I've recently had very similar issues with a 5i on a DL380 G2. I'd be very concerned about reliability.

The machine was brought down gracefully, powered off and physically relocated (just a few feet), and then powered back on. It failed to come back up.

The diagnostics CD (SmartStart v7.70) stated the controller existed, but had a 0 Byte array. ACU didn't acknowledge the controller existed at all. I've reseated everything I could find -- drives, cables, 5i controller -- no luck.

Please post if you get more information.
 
For me, I simply waited a few days and the controller magically started working again. This problem seems to be related only to cold boots- a simple "init 6" reboot does not bring down my server.

Outside of this restart issue, the machine has been running 24/7 for a couple of weeks now. Weird problem, and not one that inspires confidence...
 
So when you do an "init 0", and then cold start it right after the thing's been running, you see problems?

Burt
 
In my case, the machine had been running for years. It was physically relocated, and both power cords removed.

Previously, the machine was rebooted (eg. "init 6") many times without incident, but I don't think anyone ever did a cold boot.

I thought perhaps the CMOS battery died and the machine lost non-volatile memory when power was removed, but I've already replaced it with no luck. (The 5i doesn't have its own battery like the 5i+.)
 
I have had 3 5i controllers go bad in the past month. Apparently one of the DL380s is eating them as the replacement I put in failed a few hours later during a reboot for no reason. System power is clean, no spikes, and the disks were not pulled during operation. Now I have another one that has failed so I moved the disks to another DL380 not wanting to waste another controller on what may be a bad motherboard in that server.

Has anyone found out the cause for these boards blowing? They are not showing up in POST and are not recognized when running the diagnostics. Is there some other RAID card that can fit in the card cage instead of these, at between 50 and 150 a pop it is getting expensive, the 5i's cost as much as a complete server on ebay these days. Any insight would be appreciated.

Thanks,

Phil
 
Are the serial numbers close together? Look and see if HP put a recall for a certain lot of them...

Burt
 
I gave up, bought a handful of 5300 controllers on ebay for 15 bux each and pulled out the 5i, flipped the interlock switches and that was that. Had to do a system re-install as the disk formatting between the 2 controllers was not the same, but I have a much more secure feeling as to the reliability of the server now.

Phil
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top