Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

HP Proliant ML350 , RAID Corrupt need help badly.

Status
Not open for further replies.

1DMF

Programmer
Jan 18, 2005
8,795
0
0
GB
Hello,

To cut a long story short, we have had 3 engineers, 2 SCSI cards, 2 cables, 2 drive bays and back planes and 3 hard drives, no joy.

Our array somehow is saying disk id 6 has failed and needs replacing, only the id's can only go up to 5 (0-5) , the failed disk is in drive bay 2.

HP has now said that it is a know issue with the 641 smart array card we have and the current firmware.

They now suggest we need to flash the controllers firmware and this will solve the problem, only they seem reluctant to confirm if this will loose the array and therefore our disks and data.

Does anyone know if flashing the firmware of the SCSI 641 smart array controller, will or possibly cause the loss of the array and disks/data.

obviously I am very concerened about flashing the firmware before I know the full potential risk to the array and disk data.

We have backups but that is not the issue, it would take 2-3 days to rebuild the array, load the OS and other software, apply patches and service packs, before then loading the backup software and restoring the data.

I'm suggesting we get a DR company in to mirror the drives (which are running in a degraded array and spare drive at pressent), so if the worst happens when flashing the firmware, we can mirror our systems back across and so be up and running in hours rather than days.

All input advice and recomendations really is appreciated, i'm in huge hole and need someone to provide a ladder.

many thanks 1DMF



"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
Hi


It sounds like a bitch of a problem you have, here are a few of my thoughts.
HP always say you should flash the BIOS, it seems like there standard line these days. If its a know issue, then the new bios version will have a list of issues that it addresses. ie your issue will be documented.
I am not familiar with that controller so can not comment on the risks of flashing it, I have found flashing is normally OK, but you must ensure you have a rollback plan and your backups will be your fail safe.
On the ID front, depending on your system you could have an id # 6 in a six drive array. This is because the ID is not neccessary tied to any particular ID's, this is system specific of course. I recall years ago our techo's used to set the first drive to zero and any ID for any other disk or tape drive in the system. These could be mirrored or a RAID set - what ever you required. So technically you could have a drive in slot 2 that has an ID of # 6 - often this is set by the backplane, but again these can be changed.
Check what the ACU says has failed and go by that and that alone.
I had a DL370 server once that I replaced three drives in and it still reported it as being degraded, but the OS was fine - not degraded- I left it and it has been fine for over a year.
One other thought, ID # 6 is often used by tape drives so if you have a scsi device using this ID, then I would be tempted to removed it temporarily to help isolate the fault.

Good luck

 
Thanks for the input , appreciate it.

It can't be ID 6 in our system , as you say the back plane assigns ID's and our server only goes from 0-5.

Also , it was id 2 and had been for nearly 3 years, then on the 10th may, after a reboot, it just magically changed to 6.

I couldn't care what ID it is, but if I put a new drive in slot 2 , it doesn't see it as part of the array, it shows up in the ADU as being id 2 and still wants you to replace drive ID 6.

talk about screwy!

we've also had the backplane replaced, so it isn't that causing the problem.

what's weird is if I put the old drive back in, it accepts it as being in drive ID 6, rebuilds the array and the ADU says all is fine, yet the drive is still lit up red as though it has failed.

I've been advised if we had a BBWC , it might help and could have even stopped this from happening in the first place, but will at least give us more RAID array control, but even adding it won't guarantee a fix.

We have a good back up and do remote site disaster recovery, but if it's a rebuild of the array and complete install again, that's our server down for @ 3days!

at least you have indicated that flashing the firmware BIOS is 'Normally' OK, and so have others, unfortunately this comes with a 'but', and no assurances.

HP suggest the firmware upgrade will fix it yet our server support HP certified partners do not beleive it will and recommend getting the BBWC either way.

the alternative is getting another SCSI card, 3 disks and a drive cage, putting it in another PCI-X slot creating another RAID array, mirroring the data across and then changing the boot sequence to the 2nd array, then decommisioning the old corrupt array.

But that will be expensive and the bosses don't like to spend money on a new tape when they are knackered so no guesses what they'll say to that idea!

as for tape drive, yew we have a SCSI LTO Ultrium drive, but that is not on the RAID array card, as it only has one port for staters and so is plugged into the onboard SCSI slot, so in no way should affect the RAID controller and array.

I guess there isn't much else I can do but flash the firmware and pray!

"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
Hi 1DMF

IF you fixed your problem let us know.

It sounds like drive id is over riding the backplane settings - like you say its all very weird.
You mentioned your HP authorised service guys did not think falshing would fix the problem - while they have experience they do not have access to the world wide HP database.
I am not familiar with that particular HP controller but I have seen problems resolved by flashing SAN controllers and disk trays like MSA20's

Anyway, hope you got it sorted.


Dave.
 
Hi,

Unfortunately nothing worked, I also lost my rag with HP support, because after asking them a few questions it became apparent that they actually didn't understand what was going on and it is possible that flashing the firmware WITHOUT putting the drive that thought it was ID6 back in, may have been the reason why the flash didnt work. (Done by HP Supports Advice!!!!)

So I argued that the guy who was dealing with my support call may have been the reason it didn't work and got them to conceed this was possible and so they said 'What Do You Want'!

Great I thought, right send an engineer, spare drive bay with backplane, new SCSI 641 controller (with latest firmware), SCSI Cable, 3 new SCSI drives.

They can build a new array in with the new bits, i'll get my server support guys to migrate the data with a special array migration tool and then the engineer can put the new array back into the server and take all the old stuff away.

But that was all wishful thinking!!!! , OK HP sent everything I asked, including their 'top' HP engineer, not one of these outsourced companies, a real HP Engineer.

He Built the new components with spare SCSI card etc... only when he went to plug power into the drive bay, akk!

ML350's only have one drive bay power connector, no-one at HP nor the engineer mentioned this prior to sending everything, and so he didn't even have a standard 3 pin IDE device power cable converter with him , d'oh

So , he had to take it all apart, swap the 3 disks currently running in degraded state even though we put the copy back spare ID3 that was in slot 4 into slot 3 (the one that should be ID2 but thinks it is ID6) it recognised it as being in slot 3 (ID2) - great.

So we had a disk in slot 3 ID2, seen as ID2, which is the copy back spare, and the array still sees ID6 as missing, leaving now the top 3 slots free for the other 3 new SCSI drives, phew!

ok so we created a new array on ID-3,4,5 (slots 4,5,6).

Then the server guys dialed in to migrate the array, only when they went to run the copy program, it wouldn't run because the drives in windows disk manager were DYNAMIC disks and it only runs on BASIC disks (yeah well you'd have thought they'd have check that as well wouldn't you!).

oh well, now starting to run out of options (other than a dreaded full system restore), we created an exact replica of the old array disk arrangement (as it was partitioned C & D).

We then set the new volumes to be a MIRRORED drive, that was left overnight to complete.

The next morning I switched the machine off, removed the OLD array disks (0-2), booted the machine off the new disks (luckily it worked!), went into windows disk manager and removed the mirrored drives.

I then called the HP engineer to come back to take all the old bits awway and help remove the old array.


We then switched the machine off, swapped the new drives into slots 1-3 (ID 0-2), went into the SCSI (F5) boot configuration and deleted the old array and......

BINGO!!!! all systems go, all present and correct, all as normal. finnaly. and about time too!

Now if HP had just sent 3 new drives , 1 engineer who knew what they were doing, it would have been fixed a lot sooner and a lot cheaper.

But hey ho, it was under warranty so it wasn't us who picked up the bill!

So if anyone else ends up with this happening to them, that's the way we fixed it, simple, no loss of data and only 1.5 days down time! , no restore , no fuss!

So don't let HP tell you otherwise!

N.B.
The 4th SCSI card that came also didn't have the latest firmware and if i hadn't been vigilant and got the engineer to check this and flash it, we could have ended up back to square one, as HP claim the firmware was the cause of this ghost ID6 in the first place, so make sure you check and flash any parts HP send you!



"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you."

"If a shortcut was meant to be easy, it wouldn't be a shortcut, it would be the way!
 
Thanks for the update.

Here is Australia HP have made some cut backs to the engineering dept and a lot of good staff have left, The service is not what it used to be. We are buing dell's now.


Dave
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top