Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Recovering Software RAID Mirror

Status
Not open for further replies.

baldhead

Technical User
Apr 27, 2004
111
US
Currently I have the setup below for both of my IDE drives.

SEAGATE 160GB HDA
148.4GB HDA1 mounted as /
600MB SWAP

SEAGATE 160GB HDC
148.4GB HDC1 mounted as /
600MB SWAP

I want to be able to boot from either of these drives due to them being mirrored. Right now I can bootup fine with the setup shown above, but when I take HDC and place it on the primary controller so it becomes HDA it doesn't boot. I want the RAID to be like mirroring in Windows. If your primary drive fails you can shut down the system, place the secondary drive on the primary controller and bootup. It's that easy with Windows. How can I achieve similiar results with a software mirror in Linux?

thanks
baldhead
 
I read the article and found the following line:

"Newer LILO distributions can handle RAID-1 devices, and thus the kernel can be loaded at boot-time from a RAID device. LILO will correctly write boot-records on all disks in the array, to allow booting even if the primary disk fails."

If this is true I should be able to boot off my other drive (HDC) by itself. I want to be able to boot off HDC if HDA goes down. The only solution I see for this from the article is to boot up off a cd and go into rescue mode which could perhaps mount my other disk. This still didn't give me a lot of direction as to what I should do. Somehow I'm guessing that the MBR hasn't been written corrctly to HDC and therefore it can't boot alone. What other advice can you provide me?

thanks
baldhead

 
What bootloader are you using and what raidkit:
raidtools or mdadm?
You can make your config work as you want but it will
take some work.
 
My view is that the RAID is a "smart" raid process. If a drive is bad in the mirror, the other drive is used automatically. If you then replace the bad drive with a good one, the raid mirror is rebuilt.

I WOULD NOT expect you to be able to move drives around on the IDE chain since they are "stamped" with IDs and other volume management stuff. But, again, the RAID has the smarts built in to manage the failure events WITHOUT MOVING a good remaining drive.

Perhaps even smarter than Windows?

 
I found some sites online which talk about what I'm thinking about, but I would need some help integrating their lilo.conf files into mine. I'm using mdadm with lilo as my bootloader. here is my config:

Code:
menu-scheme = Wb:kw:Wb:Wb
default = Linux
timeout = 80
lba32
change-rules
    reset
read-only
prompt
disk=/dev/hda
    bios=0x80
    disk=/dev/hdc
    bios=0x81
boot = /dev/hda

image = /boot/vmlinuz
    ###Don't change this comment - YaST2 identifier: Original name: linux###
    label = LinuxHDA
    initrd = /boot/initrd
    root = /dev/md0
    append = "resume=/dev/hda2 splash=silent acpi=off desktop"
    vga = 0x311

image = /boot/vmlinuz
    ###Don't change this comment - YaST2 identifier: Original name: failsafe###
    label = Failsafe
    initrd = /boot/initrd
    root = /dev/md0
    append = "ide=nodma apm=off acpi=off vga=normal noresume nosmp noapic maxcpus=0  3"

As you can see it's what SuSE 9.1 has given me. Most places I read talk about using two lilo.conf files. A lilo.conf.hda and a lilo.conf.hdc. Each of these files would have the boot parameter changed to either hda or hdc. This would allow booting from either one in case of a failure. Does it look like my lilo.conf will allow for booting either drive if one goes down?

thanks
baldhead
 
Why do you need to boot from another drive if they are a mirrored pair?! Raid 1 should format them both to be bootable MBR and function as a single, virutalized drive.

I don't understand why you're doing this?!!!

 
So do you think that if I take out my main drive on HDA and reboot the system will boot to the drive on HDC?

thanks
baldhead
 
NO, the RAID expects there to be drives to be in place. Either they are 1) working properly in a mirror, 2) working in a failure mode where one drive has failed, or 3) working to rebuild the mirror because you put a new drive in to replace the broken drive in #1.

You don't ordinarily deprive a RAID configuration of the drives it expects to manage; I would expect this to be particularly true for a software RAID.

Tell us again what you're trying to accomplish? This is starting to sound more theoretical versus you trying to solve an actual problem.

 
The site ref I gave you before had very good docs on how
to simulate and test a failure scenario.
According to the RAID 1 spec a drive removed from the
array is just that, a failure, it is not a special case
and the linux soft raid implementation no doubt adheres
to this. The problem is that the root FS is mounted on
soft raid and there are some special considerations
involved in this. Once again the docs specify what these
are.
Personally I would not choose to mount the root FS
in soft raid. There was an excellent article on linux
soft raid at slashdot the other day, with a lot of good
advice and feedback. Maybe you could look it up?
 
I ended up doing some testing of my own and in my process of making sure my RAID 1 was redundant and being able to boot up off both drives in case of one failing I ended up loosing the ability to boot both of them. I'm currently running SuSE 9.1 and can get in to rescue mode, stop the RAID, and then proceed to mount /dev/hda1. This shows all my data and everything is intact. I then try to get LILO to work using this command:

lilo -C /mnt/etc/lilo.conf

The error I get is:

Fatal: can't put the boot sector on logical partition 0x102

Here is my lilo.conf file:

menu-scheme = Wb:kw:Wb:Wb
default = Linux
timeout = 80
lba32
change-rules
reset
read-only
prompt
disk=/dev/hda
bios=0x80
disk=/dev/hdc
bios=0x81
boot = /dev/hda

image = /boot/vmlinuz
###Don't change this comment - YaST2 identifier: Original name: linux###
label = LinuxHDA
initrd = /boot/initrd
root = /dev/md0
append = "resume=/dev/hda2 splash=silent acpi=off desktop"
vga = 0x311

image = /boot/vmlinuz
###Don't change this comment - YaST2 identifier: Original name: failsafe###
label = Failsafe
initrd = /boot/initrd
root = /dev/md0
append = "ide=nodma apm=off acpi=off vga=normal noresume nosmp noapic maxcpus=0 3"

My Partition is 145GB and is mounted on / with type FD RAID AutoDetect

What actions should I take in order to get LILO installed correctly and this drive up and booting? By the way I also get the L 99 99 99 codes when I try and boot off this drive. Something isn't being written correctly to the MBR. What advice can you guys give?

thanks for all your help
baldhead
 
what would be your recommendation marsd, for setting up the RAID? I've seen some people who make the /boot partition separate but I don't see the logic in this if you're already mirroring the /boot partition when you've setup your / partition to be mirrored. This covers the entire drive, except swap. Anyways I still need help with my above post and feel foolish about attempting what I did. I just need to know for sure that my other drive HDC will boot and take over in case HDA kicks the bucket.

baldhead
 
/boot should be separate.
If you think about the possibilities for error, the
recovery process is only aided by a separate boot
partition.
On strategy: I would set up software raid to benefit
those processes that are most i/o dependent, dbs, logging
facilities, home directories..you get the gist.
root fs would go on a reliable, journalling fs-
I like reiserfs, ymmv-and would not be raid enabled.

Aside from strategy and about your current problem:
I am not totally sure, to be honest, that hdc would
take over in your situation. If you are booting from
soft raid there seems to be a whole set of complexities
aside from the standard. I do not have a test box here
to use,unfortunately,to replicate your scenario, so
the best I can do is advise you to look at the details of
the soft raid implementation and maybe post/research on the mdadm lists and at SuSE which usually has some very good docs.
 
What would these complexities be? I'm just raiding my entire / partition and then adding a swap at the end. Thats all. The system was booting fine earlier and I guess my plan now is to backup the data and re-install. Then I will test the mirror upon re-installation completion.
 
Unless things have changed monumentally everything here must be factored in:
Basically your partitioning scheme needs to be identical
and your kernel and bootloader needs to be provisioned correctly.
If you cannot boot from your mirrored drive after a failure
something is definitely out of whack.
 
Now mind you I'm doing all this in SuSE 9.1 using Disk Druid. Should I make a /boot partition a / partition on each of the drives and mirror them all? What is your advice? What kind of partition scheme do you recommend?
 
Boot should be standalone.
If you are not planning on further dividing your root
partition than go ahead and raid it.
My last .02 cents!
 
OK, I finally figured this out. It was quite simple. I just went into YAST and ran the boot loader and changed the boot= line to boot=/dev/hdc, which wrote that information to the second hard drives MBR. This allows me to boot from either drive now.

thanks for all the help
baldhead
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top