Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Solutions to repair a degraded RAID 1 array

Status
Not open for further replies.

wahnula

Technical User
Jun 26, 2005
4,158
0
0
US
Hi all,

SBS 2003 SP2
Asus K8N-DL w/ 2 Opterons
Onboard RAID 1, NVidia 4 series
3ware RAID 5
All WD Raptors

I wanted to get some opinions on my situation. After recovering from a power failure today, my RAID 1 array is degraded and split, I see a C: and a D: drive in SBS. My plan is to replace both drives, they are on-hand.

BUT I'm not sure which drive is good and which is bad. There is no Windows utility, just the BIOS RAID setup. I was planning on formatting the D: drive in Windows so, the next time I shut down, I can unplug one drive and see if it boots. If not, I will swap cables. Once it boots into Windows, I will shut down, replace the orphaned drive, and rebuild. Thoughts? Thanks.

Tony

Users helping Users...
 
Be careful there bud. You're sure there's no function in the RAID utility to help identify the drive (like the serial number of the bad one or turn on a flashing light on the drive)? Are you running RAID 5 or 1 - it's a little confusing, but you say "both drives, I'm assuming it's a RAID 1 that is having a problem.

Here's what I would do. I would NOT replace both drives right away. Shut down the server. Unplug ONE drive - pick a drive. Then try to boot. If NO BOOT, then you know the other drive is good. Try to boot with the good drive plugged in.

>>>>>>>>>

If it boots, shut down and replace the BAD drive and get your mirroring going. Don't replace them both. The important thing is get mirroring restored.

Somewhere before my last step (indicated by >>>>>>) I SHOULD have asked you if you have a recent data backup. If NOT, get one on the system running with the degraded situation. Data safety is always first.
 
Backup occurs regularly to several destinations. I am running (2) arrays, a RAID 1 for the OS and RAID 5 for the data. RAID 5 is fine.

The problem is that I feel that both RAID 1 disks would be bootable, since it's a recently-broken mirror. I want to make sure I am using the current OS disk (C:) for the rebuild, not the other one (D:) that I can browse in Windows, with the typical OS folders still there. That's why I want to format D: (the other RAID 1 drive), to make it un-bootable.

My intent is NOT to replace them both at the same time, it will be replace one, rebuild, replace the other. I just want to be sure that the current D: drive is un-bootable, thus the format-in-Windows approach.

Tony

Users helping Users...
 
If you have some critical data on the drive I would do a backup before anything else.

Based on your hardware description I am guessing you have a RAID 1 using the onboard RAID and a RAID 5 using the 3Ware RAID card.

I checked the Asus site for that MB and it shows it has a
Silicon Image 3114R SATA RAID. I googled that part and found this page that seems to have some tools for RAID.


I would install this and it should let you manage the RAID from windows. It is very possible that your hard drive is ok and the RAID just needs to be rebuilt. There are typically tools built in to the RAID BIOS that let you do most of this. The windows version is just a lot easier to deal with, and you can monitor the status of the RAID from Windows.

3Ware also has some nice utilities to manage there RAID drives if you don't have that. If the RAID array is on the 3ware controller then look for there windows management software.
 
Thanks Jim,

I am using the other onboard RAID controller, not the Sil 3114. It is an NVidia CK8-04 Pro controller. I know I can possibly rebuild with the existing drive, but I do not want to, as it has a known bad block, that's what threw it loose from the array.

I know how to rebuild in nVidia RAID BIOS, first you delete the drive, than add it to the array & rebuild. There is no Windows utility that I would want to load on a server. I simply want to be able to identify the drive.

Tony

Users helping Users...
 
Moot point. Windows sees it as a "System Drive" and will not allow me to rename or format it, so I will delete it & rebuild the RAID 1 array the supported way, in RAID BIOS, then swap out the drives one at a time on subsequent boots (allowing time for rebuild of course). Probably the best move anyway. Thanks to you both for replying.

Tony

Users helping Users...
 
Update:

Rebuild (or should I say reassembly) completed using the same drive, next comes swapping out one drive at a time. I say "reassembly" because this RAID controller gives no status of the rebuilding process, it simply goes from "degraded" to "healthy" in RAID BIOS. So, I need to allow ample time for rebuilding before swapping out. It's a c. 2005 server and a Windows RAID utility would be really nice, but I can't find one that I trust that supports SBS 2003.

Let this be a lesson, never build a server with onboard RAID controllers. At first I used the Sil 3114 as well for the RAID 5 array but had nothing but problems with it, thus the 3Ware card. I just wished I had bought a card that could host both arrays, the 3Ware has an excellent Windows utility and email reporting/logging.

In case anyone cares, I think I've figured out what happened. Apparently the array was degraded on 1/19/10, that corresponds to an EventLog entry on that date stating that "The drive,, has a bad block". No hardware description, just the source "nvraid". At that time I think the drive with the bad block was dropped by the array. When the power failure on the 28th led to a reboot (yes I have a smart UPS for graceful shutdown) apparently the system booted to the orphaned drive, leaving a gap in the EventLog from 1/19 through 1/28. Luckily all data is stored on the RAID 5 array, so work could continue, but I had never seen anything like this in the past.

Today, running on a single hard drive, I saw an EventLog entry for "The drive, Drive 1, has a bad block". At last, a clue! I hoped that "Drive 1" was the orphan, but going into the RAID BIOS told me that nVidia does not use the industry standard of starting from (0) as in "Drive 0", my two drives were numbered (1) and (2). And "Drive 1" was the boot drive. I recorded the serial numbers for future reference and rebuilt the array. Not having a Windows utility is huge PITA, as I will have to shutdown and reboot into RAID BIOS to check the status of the array before swapping drives, in the event that Drive 1 was thrown off without warning.

Sorry for the lengthy post, but this was a unique and unexpected event, and in the event that someone ends up here searching for "nvraid degraded" in the future, this info might be helpful. I'll followup with the results of the drive replacements. They're all scheduled to be replaced this year anyway, this is just an impetus to get it done sooner rather than later. Any day you can avoid recovering a system from backup is a good day. Thanks for reading.

Tony

Users helping Users...
 
The good RAID controllers allow you to identify a drive by flashing it's light, though usually those are the hot swap type of drives with the little LEDs on the front of the hot swap drive.

Internal drives are tough to identify unless you have added the drives one at a time and checked the RAID BIOS/setup utility as they are added and then label the drives physically.

It's like Russian roulette pulling drives out if you don't KNOW which one is the problem child.
 
Tony, thanks for the update. A lot of useful information.

Jim
 
Another update...Nvraid NForce4 stinks! Normally, all you need to do to rebuild a RAID 1 array is insert a fresh, formatted drive, assign it to the array, and let it rebuild. That does not seem to be the case here. When I reboot with the new drive attached, I see an ERROR in RAID BIOS and cannot rebuild the array. Tried another drive, same, it does not even see the new drive.

So...I got to thinking about Jim's reply, and decided to create a RAID 1 array on the Sil 3114 controller, clone the Nvraid 1 array to it, set it first in boot order, and eliminate the Nvraid controller (and its 5-year-old drives) entirely. Then I will have a nice Windows utility to monitor my array + 2 new drives.

This will probably happen in a couple of weeks as Exchange issues have ruled the week, I want to get about 1 full week of backups before I mess with the hardware again.

Tony

Users helping Users...
 
Whose fault is it for running "junky hardware" for important systems? You (they) need to spend good money on items like RAID/redundancy and backup to make sure you're covered (or whoever this equipment belongs to).
 
Update...lest everyone think I have abandoned this project (or this post)....

I spent half of a beautiful Saturday trying to clone the OS to a RAID 1 array built with new HDDs with the other onboard controller, a Sil3114. The first cloning failed, the second one worked! I tested out the system, ran a few benchmarks to put it under load, checked Exchange, file transfers etc., all good. I removed the original drives, mounted the new ones internally, cleaned up the case & rebooted. Blue screen! Tried again, same. So...that's the reason I abandoned the Sil controller in the first place, it was regularly dropping RAID 5 drives. So, I replaced the originals and went back to square one.

BTW, when I try and boot with a single drive on the NVRAID (either one), I get an ***ERROR*** in NVRAID BIOS, so it can't be rebuilt for some reason.

Next plan is to ping-pong clone the OS to a PATA drive, build a new array on the NVRAID controller (if possible), and clone it back to the new array. That's another project for another day.

Tony

Users helping Users...
 
Sounds like a lot of fun! Sounds like the way 50% to 80% of my builds/changes/upgrades/projects go. Well, let us know how the world, or maybe spindle, turns. [wink]

--

"If to err is human, then I must be some kind of human!" -Me
 
Update:

Clone to IDE: FAIL
Clone to USB: FAIL

I finally just flattened the C: drive and rebuilt from last night's backup. Whomever developed SBS' backup system gets my hearty thanks. After building a new RAID 1 array on the NVRAID controller with the new drives I did the standard install-from-CD that XP users are used to, being careful to F6 the correct drivers. Then, I updated to SP1 & SP2, imported the backup file and we're finally good to go.

I should have done this from the start, but I had problems getting a good backup when I needed it due to the bad block. Considering all the time I wasted trying to get clones to work (my guess as to the failures is the bad block) rebuilding from backup is my path going forward.

Now I'm set for another five years...except for the RAID 5 drives, they're due to get swapped out this year. It's always something. [smile]

Tony

Users helping Users...
 
How come you didn't do an ASR type of restore then just restore data?
 
That's basically what SBS backup is, it's an automated NTBackup script that restores System State, AD, and data. Overall time is about the same, entire process took only a little more than one hour.

Tony

Users helping Users...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top