Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

hacmp ghost disks and lost vg

Status
Not open for further replies.

pgilli

Technical User
Jun 26, 2007
7
IT
Hi,
cause loss of power, node1 and node2 of hacmp cluster have some problems with disks and vg.

NODE1
there is a ghost disk:
hdisk6 (Defined) has the same address of hdisk2 (Availiable)
pero1 [/]: lsdev -Ccdisk
hdisk0 Available 04-C0-00-0,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.
hdisk1 Available 04-C0-00-1,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.
hdisk2 Available 04-04-00-0,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.
hdisk3 Available 04-04-00-1,0 Bull 4.2 GB 16 Bit SCSI Disk Drive.
hdisk4 Available 04-07-00-0,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.
hdisk5 Available 04-07-00-1,0 Bull 4.2 GB 16 Bit SCSI Disk Drive.
hdisk6 Defined 04-04-00-0,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.
pero1 [/]: lspv
hdisk0 00202554b8e63ed5 rootvg
hdisk1 00202554d4e6bafd rootvg
hdisk2 00202554d56ad111 hvxvg
hdisk3 0020255438e45087 o2evg
hdisk4 0020255421cb275f hvxvg
hdisk5 00202554f5f162bf o2evg
pero1 [/]: lsvg
rootvg
hvxvg
o2evg
pero1 [/]:

in this case hacmp work fine on this node and the shared vg (hvxvg, o2evg) is ok.

NODE2
there are several ghost disks and the vg hvxvg is missing:

hdisk6 (Defined), hdisk10 (Available) have the same address of hdisk2 (Defined)

hdisk8 (Available ) has the same address of hdisk4 (Defined)


pero2 [/]: # lsdev -Ccdisk
hdisk0 Available 04-C0-00-0,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.
hdisk1 Available 04-C0-00-1,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.
hdisk2 Defined 04-04-00-0,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.
hdisk3 Available 04-04-00-1,0 Bull 4.2 GB 16 Bit SCSI Disk Drive.
hdisk4 Defined 04-07-00-0,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.
hdisk5 Available 04-07-00-1,0 Bull 4.2 GB 16 Bit SCSI Disk Drive.
hdisk6 Defined 04-04-00-0,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.
hdisk7 Defined 04-04-00-1,0 Bull 4.2 GB 16 Bit SCSI Disk Drive.
hdisk8 Available 04-07-00-0,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.
hdisk9 Defined 04-07-00-1,0 Bull 4.2 GB 16 Bit SCSI Disk Drive.
hdisk10 Available 04-04-00-0,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.
pero2 [/]: # lspv
hdisk0 00202547b844a5a8 rootvg
hdisk1 00202547a27ffd50 rootvg
hdisk3 00202554f5f162bf o2evg
hdisk5 0020255438e45087 o2evg
hdisk8 none None
hdisk10 none None
pero2 [/]: # lsvg
rootvg
o2evg

I closed hacmp on both node ad tried to import the hvxvg but I have the following error:

pero2 [/]: # importvg -y hvxvg -V 55 -n -F hdisk10
PV Status: hdisk4 00202554d56ad111 PVNOTFND
hdisk6 0020255421cb275f PVNOTFND
0516-013 varyonvg: The volume group cannot be varied on because
there are no good copies of the descriptor area.
0516-780 importvg: Unable to import volume group from hdisk10.

how can I solve this situation without risks to damage the hacmp structure and shared vg hvxvg ?

Thanks
Paolo
 
additional info:
CLINFO Version 4.2.
oslevel 4.3.3.0
 
I don't have experience with ghosts to be honest but if i were you i would try to export the vgs and remove the disks then cfgmgr and importvg!

Regards,
Khalid
 
Hi

When your servers lost power did bothe of them come up at the same time ? if so this could cause the problem

one thing you can try on node2 ( i assume this is the standby server) , run cfgmgr see the state of the disks

try that first , let us know the results
 
Hi Khalid,
I thought the solution you provided, but have some doubts cause the mirror of hacmp shered vg is done using one disk on node1 and one disk on node2.
On node1, hvxvg on local disk hdisk2 is mirrored on hdisk4 on node2 (or viceversa, I don't know as identify local and remote disk)

hdisk2 00202554d56ad111 hvxvg
hdisk4 0020255421cb275f hvxvg.

I don't want removing device from node 2, may cause problem to hacmp or damage hvxvg on node 1 that now is working fine.

in theory, on node 2, I have to remove hdisk10 ,hdisk8 and hdisk2, leaving hdisk4 and hdisk6 (as required failed importvg hvxvg:
importvg -y hvxvg -V 55 -n -F hdisk10
PV Status: hdisk4 00202554d56ad111 PVNOTFND
hdisk6 0020255421cb275f PVNOTFND
0516-013 varyonvg: The volume group cannot be varied on because
there are no good copies of the descriptor area.
0516-780 importvg: Unable to import volume group from hdisk10. )

Running cfgmgr or rebooting node2, should make hdisk4 and hdisk6 avaliable to run the importvg and solve my problem, but I don't know if this simlple solution is the right thing to do.(see
Regards
Paolo
 
HI.
DSMARWAY, node2 is stby server, the cluster is locate in a remote site and I suppose the 2 node started at the same time when electrical power was up.

I ran cfgmgr on node2 and also rebooted it.
in both case, the situation is not varied:
pero2 [/]: # lsdev -Ccdisk
hdisk0 Available 04-C0-00-0,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.
hdisk1 Available 04-C0-00-1,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.
hdisk2 Defined 04-04-00-0,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.
hdisk3 Available 04-04-00-1,0 Bull 4.2 GB 16 Bit SCSI Disk Drive.
hdisk4 Defined 04-07-00-0,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.
hdisk5 Available 04-07-00-1,0 Bull 4.2 GB 16 Bit SCSI Disk Drive.
hdisk6 Defined 04-04-00-0,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.
hdisk7 Defined 04-04-00-1,0 Bull 4.2 GB 16 Bit SCSI Disk Drive.
hdisk8 Available 04-07-00-0,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.
hdisk9 Defined 04-07-00-1,0 Bull 4.2 GB 16 Bit SCSI Disk Drive.
hdisk10 Available 04-04-00-0,0 Bull 2.1 GB 16 Bit SCSI Disk Drive.

pero2 [/]: # lspv
hdisk0 00202547b844a5a8 rootvg
hdisk1 00202547a27ffd50 rootvg
hdisk3 00202554f5f162bf o2evg
hdisk5 0020255438e45087 o2evg
hdisk8 none None
hdisk10 none None
 
Hi Paolo,

If you do an ' lsvg -p hvxvg ' on node2 what disks does AIX think are associated with that volume group.

Regards,
Mike

 
Hi Paolo,

Nevermind, Please disregard my last post. That was a stupid question. You will not be able to run the lsvg command since the volume group is not varied on.

You should be able to see what disks the odm thinks are in that volume group though by excecuting this command:

odmget CuAt | grep -p hvxvg

It will give you the pvid's that AIX thinks should be available. (attribute = "pv" , value = pvid+bunch of 0's)

Regards,
Mike
 
Hi Mike
no rows returned by odmget command:
pero2 [/]: # odmget CuAt | grep -p hvxvg
pero2 [/]: #

On active node this is the output:
pero1 [/]: odmget CuAt | grep -p hvxvg
CuAt:
name = "hvxvg"
attribute = "vgserial_id"
value = "00202554d56ad3d4"
type = "R"
generic = "D"
rep = "n"
nls_index = 637

CuAt:
name = "hvxvg"
attribute = "pv"
value = "00202554d56ad1110000000000000000"
type = "R"
generic = ""
rep = "sl"
nls_index = 0

CuAt:
name = "hvxvg"
attribute = "pv"
value = "0020255421cb275f0000000000000000"
type = "R"
generic = ""
rep = "sl"
nls_index = 0

CuAt:
name = "hvxvg"
attribute = "quorum"
value = "n"
type = "R"
generic = ""
rep = "sl"
nls_index = 0

CuAt:
name = "hvxvglog"
attribute = "lvserial_id"
value = "00202554d56ad3d4.1"
type = "R"
generic = "D"
rep = "n"
nls_index = 648

CuAt:
name = "hvxvglog"
attribute = "copies"
value = "2"
type = "R"
generic = "DU"
rep = "r"
nls_index = 642

CuAt:
name = "hvxvglog"
attribute = "type"
value = "jfslog"
type = "R"
generic = "DU"
rep = "s"
nls_index = 639

CuAt:
name = "hvxvg"
attribute = "timestamp"
value = "469f57390d67a641"
type = "R"
generic = "DU"
rep = "s"
nls_index = 0

CuAt:
name = "hvxvg"
attribute = "auto_on"
value = "n"
type = "R"
generic = "DU"
rep = "l"
nls_index = 638

pero1 [/]:

regards
Paolo
 
Hi everybody,
please, are there any ideas/suggestions to manage and solve my problem?
Regards
Paolo
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top