Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Remove mirrored rootvg drive while system is on

Status
Not open for further replies.

ksas025

Technical User
Jun 3, 2004
92
US
I would like to simulate a disk failure of my rootvg which is mirrored acorss two disks. Can I do this by simply pulling one of the disks while the system is on? I am a little reluctant to do this and would like some advice/suggestions on how to test my mirror VG capabilities.

Thanks.
 
Thanks for the response.

I have tested that already but I am more concerned about actual, real-world hardware failure. If one of the drives goes physically bad then isnt that the same as simply pulling the drive from the bay? To the Computer, I would theorize anyway, these events look exactly the same?

What do you think?

 
You could do that. What I would do is, shutdown the system, and pull the disk out. Thats about the same as 'real-world' failure.


 
Mag0007, I did that already and it works fine..Great actually. But what I want to see is how the system handles hardware failure under load. (by the way, this is our development/test system.)

I think what I am going to do is maksysb rootvg and then pull one of the mirrored members while the system is running and mounted. Ill post back with the results for the interested.

 
Great, before you do that: If your paging space is in your rootvg, make sure its mirrored. Also for your dumpdevice, make sure they are mirrored.

hth

 
A dumpdevice cannot be mirrored, but you can have a primary dump device on one disk and a secondary dump device on another disk. All other LVs should be mirrored and synced before you go and pull out one of your rootvg disks.

If the disks are not hotplug, I'd try an pull the power connector off one of the disks instead of the data connector.


HTH,

p5wizard
 
Well, the test is complete and I must say my rootvg mirror performed famously.

First, I simply brought up the system, opened up 3 remote xterm sessions and pulled hdisk0 (disk it booted from). I then went back to my xterms and attempted to ls, view, and execute files on rootvg. Other than a 5-10 second lockup everything worked fine. I as able to access and execute data on rootvg after pulling hdisk0 during normal operation.

I re-insert hdisk0.....

Second, I wanted to simulate some sort of disk access while I pulled the disk. To do this I opened a file for editing on one xterm and began cat'ing a large text file in the other. Again I pulled hdisk0. The cat'ing xterm paused for about 5 seconds then resumed its normal output. The vi session seemed to be unaffected.

I have to believe that some sort of file corruption ocurred during this test but I am uncertian how to check. Anyone know a good way to check the filesystem.

Another issue I have is how to monitor these types of events. I noticed that the pulling of disks was not recorded anywere in the /var/adm/log files. How is an administrator to know if disks are failing with out actual interaction. It would be nice if I could monitor a log file and notify the appropriate people in the event of a disk failure or other hardware event.

 
I have to believe that some sort of file corruption ocurred during this test but I am uncertian how to check. Anyone know a good way to check the filesystem."

While you create 100M file, remove the disk
dd if=/dev/zero of=zero bs=1024k count=100

Then see if the 100M file is created on the filesystem :)

 
errpt should show some disk related errors for the time the disk went missing
errpt -a|more
 
If you want to be notified when errors occur on your system you can configure error notification methods

You can do that thru scriptnig (monitoring errpt as my friend DukeSSD mentioned) or you can use the errnotify object class in the ODM to be notified by an email or so.
 
Thanks DukeSD and Khalidaaa, you read my mind! I did the test and did not notice any logs indicating that I pulled the disk. I ran errpt and it dumped a bunch of errors regarding the disk I pull but I did not know how to monitor it since its source is not plain text.

I will look up the errnotify object class immediately! If a disk fails, I want to know about it.

Alex
 
Did it!

I decided to add an errnotify object to forward all errdaemon events to syslog since I am remotely monitoring syslog already. This will give me sufficient heads up before (or during) a hardware failure.

However, I would like to test the object. Is there any way to non-destructively force an error? I dont want to pull a disk or anything like that to test if my odmadd worked.

Thanks again everyone for the help.

Alex
 
Run advanced diags in system verification to the operator panel.

Fail one of the tests i.e. lie and answer 'no' when it asks if the display is showing all zeros.

This will cause a permanent hardware error to be logged in errpt against the operator panel.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top