Change of Physical Volume under HACMP 1

samirm · Sep 7, 2006

Hi,

Sorry for this long question.

We have two servers ( database and application) with AIX 5.1
The disk enclosures are SSA. Initially the HACMP was configured, and while starting the cluster it was making database VG available for database server and the application VG available for application server. And subsequently the IP address used to change from boot IP to service IP.

Now at present there is a disk failure in one SSA enclosure. This disk was mirrored. Hence the data is available. While making cluster Up ( smitty clstart) in database server, it was skipping the mounting activity for the filesystems and it was not changing the IP too ( may be this disk error, as we saw in hacmp.out file during starting the cluster it does status check for all the configured hard disks and then it does mounting). We suspected due to the disk failure the HACMP is not coming up.

At present we have made volume group varyon for the two servers ( database & application ) and changed the IP address for the ethernet interface by smitty chinet. ( the service IP). HACMP version is = 4.5

Now my question is, what is the best way I should follow to get the system as before without any more risk.

I need the steps which needs to be followed during this situation. remember I want new disk to be configured with the volumegroup in SSA enclosure and the HACMP has to make UP.

We have two enclosures @ 16 SSA disk @ 36 GB

volume group :-
rootvg
db_vg
appl_vg

These volume groups are common for both the servers.

for the database server hdisk11 = missing

db_vg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk11 missing 1086 1 00..00..00..00..01

If you need any more information ..let me know.
Thanking you ..

Sam

ogniemi · Sep 8, 2006

I did replaced SSA disk on running cluster (HACMP) in the following way:

Assuming your mirrored cluster VGs (shared) have quorum off so the VG with missing disk is still online.

For safety reason before I started disk replacement I did a 3rd copy (from CSPOC) of all LVs (which resided on the failed disks) temporaily to another good disk (let's call it "interim disk"):

smitty cl_lvsc
"Add a Copy to a Shared Logical Volume"
(should check if new copies are syncd)

Next, I removed all these LV copies from the failed disk (using CSPOC):

smitty cl_lvsc
"Remove a Copy from a Shared Logical Volume"

Next, I removed this failed disk from the VG (using CSPOC):

smitty cl_vgsc
"Remove a Physical Volume from a Shared Volume Group"

You should see that the disk is removed from the VG on all cluster nodes (lspv).

Next, I removed the failed disk (logical and phisical device) from all cluster nodes:

rmdev -dl hdiskX
rmdev -dl pdiskY

pdiskY you can obtail running:

ssaxlate -l hdiskX

Next, I phisically replaced disk in SSA enclosure.

Next, I discovered this disk on one cluster node:

cfgmgr -l ssar

If the disk was brand new (no PVID assigned) then I run on that node:

chdev -l hdiskZ -a pv=yes

Next, I run "cfgmgr -l ssar" on the other nodes.

Next, I added new disk to the shard VG (from CSPOC):

smitty cl_vgsc
"Add a Physical Volume to a Shared Volume Group"
(after that on all cluster nodes I'd seen the disk assigned to the VG)

Next, I created "affected" LV's 3rd copies on new disk.

smitty cl_lvsc
"Add a Copy to a Shared Logical Volume"
(should check if new copies are syncd)

Last step, I did removed LVs copies from "interim disk".

samirm · Sep 8, 2006

Thanks ogniemi,

I will go through this. if any more support is needed, will catch you.

Thanks again.

Sam

samirm · Sep 20, 2006

Hi,
This weekend most likely we are going to do this. We have identified one disk which was not being used in the enclosure.

Node A ( database server)
=========================
lsvg =
rootvg
db_vg
appl_vg

lsvg -o =
rootvg
db_vg

Node B (Appl. Server )
======================
lsvg =
rootvg
db_vg
appl_vg

lsvg -o =
rootvg
appl_vg

At present pdisk9 / hdisk11 gone bad. As it is throwing error in errpt.

Date/Time: Mon Sep 18 09:00:01 EDT
Sequence Number: 3696
Machine Id: 0002F82A4C00
Node Id: nodeA
Class: H
Type: PERM
Resource Name: pdisk9
Resource Class: pdisk
Resource Type: scsd
Location: 1j-08-P
VPD:
Manufacturer................IBM
Machine Type and Model......UCPRL03
Part Number.................18P2474
ROS Level and ID............5909
Serial Number...............PVZ08842
EC Level....................4200328078
Device Specific.(Z2)........25L2820093
Device Specific.(Z3)........18P2474
Device Specific.(Z4)........02102

Description
DISK OPERATION ERROR

Probable Causes
DASD DEVICE

Failure Causes
DISK DRIVE

ssaxlate -l pdisk9
= hdisk11

lslv -l wmsora2a_lv
wmsora2a_lv:/WMS/ora2
PV COPIES IN BAND DISTRIBUTION
hdisk11 1085:000:000 20% 218:217:217:217:216
hdisk34 1085:000:000 20% 218:217:217:217:216

hdisk11 and hdisk34 were mirrored.

lslv -m wmsora2a_lv

wmsora2a_lv:/WMS/ora2
LP PP1 PV1 PP2 PV2 PP3 PV3
0001 0001 hdisk11 0001 hdisk34
0002 0002 hdisk11 0002 hdisk34
0003 0003 hdisk11 0003 hdisk34
0004 0004 hdisk11 0004 hdisk34

ssaxlate -l hdisk34
= pdisk31

lscfg | grep pdisk31
+ pdisk31 14-08-EA57-01-P SSA160 Physical Disk Drive (36400

So, the good disk is at 1st slot in EA57 SSA enclosure.

Now we are planning to include unused pdisk from the same enclosure in the volume group db_vg and break the mirror (between hdisk11 and hdisk34) and make the mirror ( hdisk34 and hdisk18 ). When the new disk will come, we will take out the faulty disk pdisk9 and put back the new disk.

Now correct me if I am wrong to findout the unused disk in the enclosure which is not being used by any servers (NodeA and NodeB).

Node B
======
lspv

hdisk0 0002f83a46d53733 rootvg
hdisk1 0002f83aefc7e864 rootvg
hdisk3 0002f82aaf9044a0 db_vg
hdisk4 0002f82ab53aab26 db_vg
hdisk15 0002f83a7f1e154f None
hdisk2 0002f82ab7fa33ea db_vg
hdisk5 0002f82ab53fe9bd db_vg
hdisk6 none None
hdisk7 0002f82aaf904b5c db_vg
hdisk8 0002f82ab4cc7a41 db_vg
hdisk9 none None
hdisk30 0002f83ab6050afb appl_vg
hdisk31 none None
hdisk28 0002f82ab0ae411d db_vg
hdisk32 0002f83ab605116c appl_vg
hdisk10 0002f82a2f5eca73 db_vg
hdisk18 0002f83a26f9b017 None
hdisk21 0002f82ab7fa3925 db_vg
hdisk12 0002f82aade6ddee db_vg
hdisk13 0002f82aadfff39b db_vg
hdisk14 0002f82ab53ab083 db_vg
hdisk16 0002f82aadfff903 None
hdisk17 0002f82aae06c2bf db_vg
hdisk26 none None
hdisk20 0002f83ab604f531 appl_vg
hdisk22 0002f83ab604faa8 appl_vg
hdisk19 0002f82ab0ae3146 None
hdisk23 0002f83ab6050015 appl_vg
hdisk24 0002f82ab0ae3698 db_vg
hdisk25 none None
hdisk33 0002f83a26f9aaab None
hdisk34 0002f83a26f9a4ab appl_vg
hdisk35 0002f82a2ba063ef db_vg

please look at hdisk18

ssaxlate -l hdisk18
= pdisk16

lscfg | grep pdisk16
+ pdisk16 14-08-7517-14-P SSA160 Physical Disk Drive (36400

So, we are confirm the disk at 14th slot in 7517 enclosure hdisk18 / pdisk16 is not being used by any Volume Group in Node B.

Node A
======
lspv

hdisk0 0002f82a7d4f23a1 rootvg
hdisk1 0002f82a267b7231 rootvg
hdisk3 0002f82aaf9044a0 db_vg
hdisk2 0002f82ab7fa33ea db_vg
hdisk12 0002f82aade6ddee db_vg
hdisk10 0002f82a2f5eca73 db_vg
hdisk18 0002f83a26f9b017 None
hdisk13 0002f82aadfff39b db_vg
hdisk21 0002f82ab7fa3925 db_vg
hdisk16 0002f82aadfff903 None
hdisk17 0002f82aae06c2bf db_vg
hdisk14 0002f82ab53ab083 db_vg
hdisk19 0002f82ab0ae3146 None
hdisk24 0002f82ab0ae3698 db_vg
hdisk26 none None
hdisk25 none None
hdisk20 0002f83ab604f531 appl_vg
hdisk33 0002f83a26f9aaab None
hdisk22 0002f83ab604faa8 appl_vg
hdisk23 0002f83ab6050015 appl_vg
hdisk7 0002f82aaf904b5c db_vg
hdisk8 0002f82ab4cc7a41 db_vg
hdisk9 none None
hdisk28 0002f82ab0ae411d db_vg
hdisk4 0002f82ab53aab26 db_vg
hdisk5 0002f82ab53fe9bd db_vg
hdisk6 none None
hdisk30 0002f83ab6050afb appl_vg
hdisk31 none None
hdisk32 0002f83ab605116c appl_vg
hdisk15 0002f83a26f9a4ab appl_vg
hdisk34 0002f82a2ba063ef db_vg
hdisk27 0002f83a7f1e154f None

ssaxlate -l hdisk18
= pdisk16

lscfg | grep pdisk16

+ pdisk16 1j-08-7517-14-P SSA160 Physical Disk Drive (36400

So, we are confirm the disk at 14th slot in 7517 SSA enclosure hdisk18/pdisk16 are not being used by any volume group.

STEPS which I am going to follow :-
=====================================

In NodeA

Include hdisk18 to the volume group
Break the mirror from failed disk (hdisk11)
Extend the mirror copy to hdisk18
Un-mount all file systems and varyoff VG.

In NodeB

Identify the hdisk (minimum one) belonging to db_vg this will be required while importing.

Note the major number of db_vg and ensure that it matches with the one in snswmspd server

Export db_vg

Import db_vg with the same major number and specifying one hdisk (minimum) belonging to that vg.

Change the auto-varyon to NO for db_vg

Test the VG by mounting the file systems
Un-mount and varyoff the VG

Start HACMP
The service IP assigned has to be changed to boot ip (chinet) in both servers.
Start HACMP in NodeA first and then in NodeB.

Please correct me if I am wrong from any side ....
As I am not getting any software support for AIX 5.1 from IBM and at present the oslevel -r showing AIX 5100-04

Thanks again ..

Sam

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Change of Physical Volume under HACMP 1

samirm

Technical User

ogniemi

Technical User

samirm

Technical User

samirm

Technical User

Similar threads

Part and Inventory Search

Sponsor