Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

one client disk 2 vios scenario problem

Status
Not open for further replies.

UsRb

IS-IT--Management
Mar 19, 2007
48
HR
Hi2all!

I have strange problem, using 2 vios scenario and one client partition. On client I see two disks using same PVID, but on different paths, and can not get it to failover when vios 1 is rebooted or turned off.

Client:
root@hostname #lspv
hdisk0 00c5d3402e990f5c rootvg active
hdisk1 00c5d3402e990f5c rootvg active

root@hostname #lspath
Enabled hdisk0 vscsi0
Enabled hdisk1 vscsi1

root@hostname #lscfg -vl vscsi0
vscsi0 U9117.MMA.655D340-V4-C143-T1 Virtual SCSI Client Adapter
root@hostname #lscfg -vl vscsi1
vscsi1 U9117.MMA.655D340-V4-C144-T1 Virtual SCSI Client Adapter

both VIOS:

1. lsattr -El hdisk5

PCM PCM/friend/MSYMM_RAID5 Path Control Module True
PR_key_value none Persistant Reserve Key Value True
algorithm fail_over Algorithm True
clr_q yes Device CLEARS its Queue on error True
hcheck_interval 10 Health Check Interval True
hcheck_mode nonactive Health Check Mode True
location Location Label True
lun_id 0x7000000000000 Logical Unit Number ID False
lun_reset_spt yes FC Forced Open LUN True
max_transfer 0x40000 Maximum TRANSFER Size True
node_name 0x5006048452a88768 FC Node Name False
pvid 00c5d3402e990f5c0000000000000000 Physical volume identifier False
q_err no Use QERR bit True
q_type simple Queue TYPE True
queue_depth 16 Queue DEPTH True
reserve_policy no_reserve Reserve Policy True
rw_timeout 40 READ/WRITE time out value True
scsi_id 0x10000 SCSI ID False
start_timeout 180 START UNIT time out value True
ww_name 0x5006048452a88768 FC World Wide Name False
 
On the client you should see 2 paths for 1 hdisk, not 2 different disk names...

lspath output?

What disk/LUN do the VIOSs see? Is that the same disk or 2 different copies?


HTH,

p5wizard
 
Yes I know, this is the output:

root@hostname #lspath
Enabled hdisk0 vscsi0
Enabled hdisk1 vscsi1
 
That is one disk presented to 2 VIOS
 
UsRb said:
That is one disk presented to 2 VIOS

Not in my book!
If that would be the case, you'd see:

root@hostname #lspath
Enabled hdisk0 vscsi0
Enabled hdisk[red]0[/red] vscsi1

HTH,

p5wizard
 
In my book to for the rest of the servers, but this one is f**ed :)

VIOS 1:
# ./emc
Inquiry utility, Version V7.3-771 (Rev 0.0) (SIL Version V6.3.0.0 (Edit Level 771)
Copyright (C) by EMC Corporation, all rights reserved.
For help type inq -h.

............
/dev/rhdisk5 :EMC :SYMMETRIX :5773 :37008f9000 : 89136000

$ lsmap -all
SVSA Physloc Client Partition ID
--------------- -------------------------------------------- ------------------
vhost0 U9117.MMA.655D340-V2-C143 0x00000004

VTD bj-repo_rtvg
Status Available
LUN 0x8100000000000000
Backing device hdisk5
Physloc U789D.001.DQDWVBV-P1-C5-T1-W5006048452A88768-L7000000000000

VIOS 2:
# ./emc
Inquiry utility, Version V7.3-771 (Rev 0.0) (SIL Version V6.3.0.0 (Edit Level 771)
Copyright (C) by EMC Corporation, all rights reserved.
For help type inq -h.

............
/dev/rhdisk5 :EMC :SYMMETRIX :5773 :37008f9000 : 89136000

$ lsmap -all
SVSA Physloc Client Partition ID
--------------- -------------------------------------------- ------------------
vhost0 U9117.MMA.655D340-V3-C144 0x00000004

VTD bj-repo_rtvg
Status Available
LUN 0x8100000000000000
Backing device hdisk5
Physloc U789D.001.DQDWVVB-P1-C5-T1-W5006048452A88777-L7000000000000
 
on the client, what does lspv and lsvg -p rootvg show?

I'd consider removing one of the disks (e.g. rmdev -dl hdisk1) and then rediscovering all devices (cfgmgr) to configure the 2nd vtscsi device as 2nd path to hdisk0.

But only if you're 100% sure you won't break anything else...

You might want to try and configure another disk via both VIOSs and play around with a testvg before you accidentally start sawing off the branch you're sitting on... ;-)

HTH,

p5wizard
 
I deleted hdisk1 successfuly, then ran cfgmgr, and hdisk1 didn't appear, but path is still only one.

lspath
Enabled hdisk0 vscsi0

lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active 680 217 43..00..00..38..136
 
Have you checked the reserve_policy attribute for this hdisk in the 2 VIOs. Take a look with "lsattr -El hdiskX" on both VIOs.
It should be set to no_reserve on both VIOs before mapping the hdisk to the LPAR. If it's not done before the mapping then this can happen.
If it's not set you must delete the mapping, change the reserve_policy with "chdev -l hdiskX -a reserve_policy=no_reserve", and then do the mapping again.

PS: depending on the SAN disks (EMC, IBM, HP) it can be reserve_policy or reserve_lock attribute.
If it's reserve_lock, then it must be set to no.
 
Hi MoreFeo, thanks for your suggestion, but I already did that before posting on the forum. I checked static, dynamic profiles of client and VIO servers, seems to me that conf is ok. Can I run some kind of deep inspection on the machine? Diag is not telling me anything useful.
 
Is the mapping between vhost device on VIOS and vscsi device on VIO Client OK?

HTH,

p5wizard
 
Make sure both VIOSs can open the device hdisk5 (though you already did with the inq command...)

[tt]oem_set_env
/usr/sbin/bootinfo -s hdisk5[/tt]

Any other discrepancies between the VIOSs? EMC driver fix level? VIOS [tt]ioslevel[/tt]?

VIO client oslevel compared to other (working) LPARs?

After that, I'm out of ideas.


HTH,

p5wizard
 
Maybe the reserve_policy parameter was changed after doing the mapping. I've had the same problem, and it was caused by mapping before and changing the reserve_policy after.

If this is the case you'll need to remove both mappings and remap, that's how I solved it.
 
Ok, I turned off client lpar, removed vhost, vtscsi and hdisk from VIO servers, rebooted them, changed no_reserve attribute, and mapped it back, turned on client lpar, same thing, so I suppose that client ODM is corrupted. Do you guys have the commands what I can check in ODM?
 
I also know of a similar problem, where the SAN box still had a reserve lock on LUNs but none of the VIO servers knew about those reservations - or of a way to force them free.
The only resolution then was to create and present new LUNs to the VIOSs, map those LUNs to the VIO client and migrate the OS and data VGs. Then for the old LUNs: delete VSCSI disks/paths (client), delete mappings (VIOSs), delete presentation/zoning (EMC/SAN), and delete LUNs themselves (EMC).

So experimenting with a new non-rootvg LUN (as suggested earlier) may shed some light on the situation.


HTH,

p5wizard
 
When I have this kind of problems and I want to take a look at ODM I use some scripts.

First script queries all ODM classes and greps for the hdisk and PVID:
Code:
cd /etc/objrepos
ls | while read param
do
  odmget $param | grep hdisk0 >> /tmp/odm.hdisk0.$param
  odmget $param | grep hdisk1 >> /tmp/odm.hdisk1.$param
  odmget $param | grep 00c5d3402e990f5c >> /tmp/odm.PVID.$param
done

After running this script (it will show a lot of errors, but it's normal) you have some files in /tmp that will look like:
Code:
# ls odm.*
odm.hdisk0.CuAt
odm.hdisk0.CuDv
odm.hdisk0.CuDvDr
...
If you take a look at these files they will show which params you should use to query the odm classes.
Code:
# more odm.*
        name = "hdisk0"
        name = "hdisk0"
odm.hdisk0.CuAt: END (next file: odm.hdisk0.CuAt.vc)
        name = "hdisk0"
odm.hdisk0.CuDv: END (next file: odm.hdisk0.CuDvDr)
        value3 = "hdisk0"
odm.hdisk0.CuDvDr: END (next file: odm.hdisk0.CuPath)
        name = "hdisk0"
odm.hdisk0.CuPath: END (next file: odm.hdisk0.CuPath.vc)
...

So I know I need to query CuAt with parameter name=hdisk0, CuDvDr with the same parameter, CuPath with parameter value3=hdisk0, etc...

Code:
odmget -q name=hdisk0 CuAt
odmget -q name=hdisk0 CuDvDr
odmget -q value3=hdisk0 CuPath
...

It's a bit messy, but this way I've been able to detect and solve mismatches between ODM and physical configuration.
 
I made last thing that I wanted, added new disk, did migratepv, and removed old disk, now I have 2 new vscsi adapters and failover works ok.
It's a pitty I wont find out why this problem happened. Luckily this wasn't production machine. Thx guys for your help
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top