cascading cluster with shared NFS filesystems

ogniemi · Nov 20, 2003

Hello,

I met the following situation on cascading cluster.

nodeA has local filesystems (disks are shared, SSA loop going through both cluster nodes: nodeA and nodeB).
During crash of nodeA, the takeover process was started but filed with event config_to_long because of problem with unmounting NFS filesystems which were mounted from crashed nodeA. Is it normal that in HACMP nodeB had problems with unmounting NFS filesystems after crash of node which exported those filesystems? Is there any solution to avoid such problems with fs unmounting in the future?
I had never met it in the past,- there were also problem with unmounting NFS filesystems but finally cluster managed it.

thx in advance,m.

petertp6 · Nov 20, 2003

I suggest you stop use NFS(export from nodeA) in nodeB, If you can.
Your probelm I have met once,
I felt it like a dog try to bit its tail.
You can use NFS in HACMP,but just export it, other nodes in this cluster never mount it.

ogniemi · Nov 20, 2003

Hello petertp6,

But this is an automatic feature of HACMP. When I configure in Resource Group the filesystems "Filesystems/Directories to be exported" then the NodeB will automatically mount them after HACMP is started on nodeB from nodeA.
So it is normal that nodeB mounts NFS filesystems from nodeB.

regards,m.

petertp6 · Nov 23, 2003

The NFS you want to switched between 2 nodes,
It must be:
1.On shared disk, otherwise it can not be takeover.
2.Only ONE of nodes can be NFS server, others can not mount the NFS(be NFS client).
It is basic needs.
Your situation looks like at conflict with second point.

ogniemi · Nov 23, 2003

1. but I told, those are shared disks/VGs/LVs between cluster nodes - the SSA loop goes through both nodes - additionally SSA adapters is used for TMSSA and it works (cldiag shows sent heartbits and returned acknowledgements)
2. There is no conflict you are talking about. The cluster was up and running perfectly for about 6 months. When it was started 6 months ago all required filesystems where mounted locally on one node and via NFS on second node. I tested all possibilities during HACMP tests and then after I halted "halt -q" node having local filesystem, the node having NFS filesystems unmounted them and mounted them locally (I told they are shared, additionally shared VG has the same major number on both nodes to keep active mounts on external NFS clients after cluster takeover)

Of course all this FS/NFS tasks are made by HACMP - there are even no entries in /etc/exports because these are not classical exports made by admin.

,m.

petertp6 · Nov 24, 2003

first, I have to say sorry about my bad english, I know few words only.
If there is any reply to make you unhappy, PLS be calm.
Return to our topic, Let's clear what in my brain,
NodeA mount all FS, and export some FSs as NFS server.
NodeB mount NFS as a NFS client.
Is that right?
One of my case is same. It work fine in normal time.
But I met same problem when I want to takeover to NodeB.
Because NodeA try to umount NFS, but NodeB is using NFS.
So NFS can not be umount and cause a cinfig_too_long.
I think it's can not be fixed,
unless you can make NodeB umount NFS first,
In final, I ask user give NFS in this cluster.

ogniemi · Nov 24, 2003

Hello, Petertp6

If it is realy not to be fixed then High Availability is lost in my configuration. I don't know how to tell to nodeA: "Please do inform nodeB to stop services and umount NFS filesystems before you are going to crash". In cascading cluster when the nodeA is unexpectedly crashed(power, hardware error) then nodeB has to take care about services of nodeA - that is what I am expecting.

Besides, I investigated there was a problem just with 2 NFS filesystems to unmount.

cl_deactivate_nfs[50] sleep 2
cl_deactivate_nfs[51] cl_nfskill -k -u /datab1
/datab1:

cl_deactivate_nfs[38] true
cl_deactivate_nfs[39] umount -f /datab1
umount: 16 error while unmounting nodeA:/datab1 - Device busy
cl_deactivate_nfs[40] [ 1 -ne 0 ]
cl_deactivate_nfs[41] [ 1 = 0 ]
cl_deactivate_nfs[50] sleep 2
cl_deactivate_nfs[51] cl_nfskill -k -u /datab2
/datab2:

cl_deactivate_nfs[38] true
cl_deactivate_nfs[39] umount -f /datab2
umount: 16 error while unmounting nodeA:/datab2 - Device busy
cl_deactivate_nfs[40] [ 1 -ne 0 ]
cl_deactivate_nfs[41] [ 1 = 0 ]
cl_deactivate_nfs[50] sleep 2
cl_deactivate_nfs[51] cl_nfskill -k -u /datab1
/datab1:

The other eight also mounted from nodeA were unmounted with success.

BTW: do you know what should be done after "config_to_long" happen? I just restarted node to have clear node status but maybe it was not necessary.

regards,m.

DSMARWAY · Nov 24, 2003

Hi ,

Dio you want NFS to be run one one node at time and if node A fails then node B should take over the resources ?
OR
Do you want node A and Node B in parrallel accessing the export filesystem ?

If node A has crashed , then why is it unmounting the exported Filesystem ? could this be that node B has cgot it mounted if so you have to set up HACMP &NFS as cross-mounting ?

I would envisage this setup as follows:-
node a has exported file system on sharded disk

A client on their pc or somewhere accesses the NFS filesystem , so it is important you select the correct IP network to mount NFS filesystem. If node A crashes
then NFS filesystem will mount on node B , and then the client should automatically re-connect

If you get a config_too_long that means something is taking too long in which case your umount , from what you are saying.
By default you have 6 minutes to stop everything on node A , you can increase/decreaee this by issuing:-

chssys -s clstrmgr -a "-u 600000"
that miliseconds i.e. 600 seconds which is 10 minutes

SOrry if i have repeated anything , but I would look at
why is it umounting the filesystem if the server has crashed

I would see node B starting filesystems and application if node a crashed?

ogniemi · Nov 24, 2003

Hello,
1.
the cluster is cascading build of 2 nodes nodeA and nodeB.

2.
when both cluster nodes are up and running (HACMP services are started on both nodes) then nodeA has configured resourcesA and nodeB rosourcesB.

resourcesA:
Resource Group Name resourcesA
Node Relationship cascading
Participating Node Name(s) nodeA nodeB
Service IP Label nodeAserv
Filesystems ALL
Filesystems Consistency Check logredo
Filesystems Recovery Method sequential
Filesystems/Directories to be exported /filesystem1 /filesystem2 /datab1 /datab2
Filesystems to be NFS mounted /filesystem1 /filesystem2 /datab1 /datab2
Network For NFS Mount local_net
Volume Groups shared_vg
Concurrent Volume Groups
Disks
Shared Tape Resources
Connections Services
Fast Connect Services
Application Servers servA
Highly Available Communication Links
Primary Workload Manager Class
Secondary Workload Manager Class
Miscellaneous Data
Automatically Import Volume Groups false
Inactive Takeover true
Cascading Without Fallback false
SSA Disk Fencing false
Filesystems mounted before IP configured false
Run Time Parameters:

Node Name nodeA
Debug Level high

Node Name nodeB
Debug Level high

resourcesB:
Resource Group Name resourcesB
Node Relationship cascading
Participating Node Name(s) nodeB nodeA
Service IP Label nodeAserv
Filesystems
Filesystems Consistency Check logredo
Filesystems Recovery Method sequential
Filesystems/Directories to be exported
Filesystems to be NFS mounted
Network For NFS Mount
Volume Groups
Concurrent Volume Groups
Disks
Shared Tape Resources
Connections Services
Fast Connect Services
Application Servers servB
Highly Available Communication Links
Primary Workload Manager Class
Secondary Workload Manager Class
Miscellaneous Data
Automatically Import Volume Groups false
Inactive Takeover true
Cascading Without Fallback false
SSA Disk Fencing false
Filesystems mounted before IP configured false
Run Time Parameters:

Node Name nodeB
Debug Level high

Node Name nodeA
Debug Level high

3.
nodeA has /datab1 /datab2 and the other filesystem built on shared SSAdisks/VGs mounted locally

nodeB has then mounted them via NFS (mounts carried out by HACMP)

What I am expecting when nodeA crashes is that nodeB unmounts broken NFS mountpoints, varryon shared VG and mount all filesystems locally and of course get service addresses and all the other services of nodeA (crashed node)

As I told, there was a problem with unmounting only 2 of 10 NFS filesystems on nodeB after nodeA crashed. All were mounted from nodeA.

Since I remember having above Resource Group Config with "Filesystems/Directories to be exported" "Filesystems to be NFS mounted" it was and it is working in the following:

nodeA: mounts ALL filesystems locally
nodeB: mounts "Filesystems to be NFS mounted" and of course being "Filesystems/Directories to be exported"

It is not need to mount them manually on nodeB - it is done automatically by HACMP.

regrds,m.

petertp6 · Nov 25, 2003

I have read you hacmp.out, and you configuration.
Did you had NodeA takeover to NodeB manually? or by reset power?
I think you shoule use SMIT or command. So NodeA can log these message.
In you config ,This line is something strange,
"Filesystems to be NFS mounted /filesystem1 /filesystem2 /datab1 /datab2"
Why NodeA mount local FS as a NFS? I suggest to leave it blank.
And move it to resourceB. It will match your idea.
In this config, I can sure real crash will be takeover successfully.
But manual takeover may get problem like you got now,
unless you umount NFS in NodeB first.
Did you try just press reset button of NodeA before?

ogniemi · Nov 25, 2003

1. HACMP tests were performed successfully before the cluster became a production (all possible takeovers performed from smitty or with crash simulation with "halt -q" run on nodeA, nodeB were completed with success)

2. The crash of nodeA happened unexpectedly - NodeA just rebooted after some error happened (btw: IBM is taking care about the reason of crash)

3. The line:

""Filesystems to be NFS mounted /filesystem1 /filesystem2 /datab1 /datab2"
"

concerns mounting of NFS filesystems on nodeB - not nodeA (it is a configuration feature of HACMP)

From documentation: "Identify the filesystems or directories to NFS mount. All nodes in the resource chain will attempt to NFS mount these filesystems or directories while the owner node is active in the cluster." (owner node is nodeA)

So moving it to resourcesB makes no sens and will not work completly.

4. I just configured test HACMP cluster: 2 LPARS with shared SSA disks - very similiar configuration as on production.

Initially, I had a status:

# clfindres
GroupName Type State Location Sticky Loc
--------- ---------- ------ -------- ----------
resourcesB cascading UP nodeB
resourcesA cascading UP nodeA

and,

======
nodeA
======
# lsvg -o
shared_vg
rootvg

Filesystem 512-blocks Free %Used Iused %Iused Mounted on
/dev/hd4 262144 210472 20% 2239 4% /
/dev/hd2 3014656 851256 72% 32775 9% /usr
/dev/hd9var 262144 113704 57% 1922 6% /var
/dev/hd3 655360 617936 6% 231 1% /tmp
/dev/hd1 262144 239456 9% 528 2% /home
/proc - - - - - /proc
/dev/lv01 131072 90512 31% 1007 7% /filesystem1
/dev/lv02 131072 125752 5% 35 1% /filesystem2
/dev/lv03 131072 90512 31% 1007 7% /datab1
/dev/lv04 131072 125752 5% 35 1% /datab2

======
nodeB
======
# lsvg -o
rootvg
# lsvg
rootvg
shared_vg

Filesystem 512-blocks Free %Used Iused %Iused Mounted on
/dev/hd4 262144 209856 20% 2257 4% /
/dev/hd2 3014656 852056 72% 32782 9% /usr
/dev/hd9var 262144 190776 28% 2688 9% /var
/dev/hd3 655360 486776 26% 4868 6% /tmp
/dev/hd1 262144 250784 5% 522 2% /home
/proc - - - - - /proc
/dev/hd10opt 131072 87256 34% 1023 7% /opt
nodeA:/filesystem1 5242880 2726424 48% 6495 1% /filesystem1
nodeA:/filesystem2 2621440 2407184 9% 4180 2% /filesystem2
nodeA:/datab1 6422528 3153008 51% 5428 1% /datab1
nodeA:/datab2 786432 761376 4% 54 1% /datab2

After I run "halt -q" on nodeA the takeover completed with success and the cluster status is the following (of course checked on nodeB because nodeA is halted):

# clfindres
GroupName Type State Location Sticky Loc
--------- ---------- ------ -------- ----------
resourcesB cascading UP nodeB
resourcesA cascading UP nodeB

NFS filesstems during takeover were unmounted on nodeB, shared VG varied on and filesystems mounted locally so I had on nodeB the following status of filesystems:

Filesystem 512-blocks Free %Used Iused %Iused Mounted on
/dev/hd4 262144 209856 20% 2257 4% /
/dev/hd2 3014656 852056 72% 32782 9% /usr
/dev/hd9var 262144 190776 28% 2688 9% /var
/dev/hd3 655360 486776 26% 4868 6% /tmp
/dev/hd1 262144 250784 5% 522 2% /home
/proc - - - - - /proc
/dev/hd10opt 131072 87256 34% 1023 7% /opt
/dev/lv01 131072 90512 31% 1007 7% /filesystem1
/dev/lv02 131072 125752 5% 35 1% /filesystem2
/dev/lv03 131072 90512 31% 1007 7% /datab1
/dev/lv04 131072 125752 5% 35 1% /datab2

Please do belive me, the problem I met is not related to HACMP configuration. The takeover failed because HACMP found some strange problems with unmounting 2 of 10 NFS filesystems - although it was trying to kill all processes keeping this 2 NFS filesystems open on nodeB)

regards,m.

sbix · Nov 25, 2003

Looks like someone on the system which mounts remotely the exported filesystems has some process running and using them.
So system can't umount the remote filesystems and can't proceed to mount them locally.
Probably you should insert some "intelligence" in the takeover script, to kill all program using the remote filesystem before the local mounting

ogniemi · Nov 25, 2003

HACMP is intelligent and should does it.
Kills all processes writing to filesystems to be unmounting:

cl_deactivate_nfs[50] sleep 2
cl_deactivate_nfs[51] cl_nfskill -k -u /datab1
/datab1:

cl_deactivate_nfs[38] true
cl_deactivate_nfs[39] umount -f /datab1
umount: 16 error while unmounting nodeA:/datab1 - Device busy
cl_deactivate_nfs[40] [ 1 -ne 0 ]
cl_deactivate_nfs[41] [ 1 = 0 ]
cl_deactivate_nfs[50] sleep 2
cl_deactivate_nfs[51] cl_nfskill -k -u /datab2
/datab2:

cl_deactivate_nfs[38] true
cl_deactivate_nfs[39] umount -f /datab2
umount: 16 error while unmounting nodeA:/datab2 - Device busy
cl_deactivate_nfs[40] [ 1 -ne 0 ]
cl_deactivate_nfs[41] [ 1 = 0 ]
cl_deactivate_nfs[50] sleep 2
cl_deactivate_nfs[51] cl_nfskill -k -u /datab1
/datab1:

regards, m.

sbix · Nov 25, 2003

Dear ogniemi,
i found something which looks to be useful:
cl_nfskill Command Fails
Problem
The /tmp/hacmp.out file shows that the cl_nfskill command fails when attempting to perform a forced unmount of an NFS-mounted file system. NFS provides certain levels of locking a file system that resists forced unmounting by the cl_nfskill command.

Solution
Make a copy of the /etc/locks file in a separate directory before executing the cl_nfskill command. Then delete the original /etc/locks file and run the cl_nfskill command. After the command succeeds, re-create a copy of the /etc/locks file.

ogniemi · Nov 25, 2003

thx for info but it is not a solution.

/etc/locks is a link to /var/locks having just file with stored cron's PID:

# ls -la locks
lrwxrwxrwx 1 root system 10 Jul 11 12:31 locks -> /var/locks
# ls -la /var/locks
total 24
drwxrwxrwx 2 root system 512 Oct 20 08:58 .
drwxr-xr-x 25 root system 512 Oct 20 08:55 ..
-r--r--r-- 1 root cron 11 Oct 20 08:58 LCK..cron

"cl_nfskill" in not executed by me manually - it is executed during HACMP takeover especially caused by unexpected system crash

)

The log comes from /etc/hacmp.out.

rgrds,m.

sbix · Nov 25, 2003

result of "man locks":
etc/locks Directory

Purpose

Contains lock files that prevent multiple uses of communications devices and
multiple calls to remote systems.

Description

The /etc/locks directory contains files that lock communications devices and
remote systems so that another user cannot access them when they are already in
use. Other programs check the /etc/locks directory for lock files before
attempting to use a particular device or call a specific system.

A lock file is a file placed in the /etc/locks directory when a program uses a
communications device or contacts a remote system. The file contains the process
ID number (PID) of the process that creates it.

The Basic Networking Utilities (BNU) program and other communications programs
create a device lock file whenever a connection to a remote system, established
over the specified device, is actually in use. The full path name of a device
lock file is:

/etc/locks/DeviceName
... BLAH BLAH BLAH ...
I do know you are not going to live near your system just to be useful in case of matter ... but ... you can simulate with halt -q a system crash and then check on the surviving system if the solutionposted above could be useful ... if so you can modify something in the hacmp scripts

))

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

cascading cluster with shared NFS filesystems

ogniemi

Technical User

petertp6

Vendor

ogniemi

Technical User

petertp6

Vendor

ogniemi

Technical User

petertp6

Vendor

ogniemi

Technical User

DSMARWAY

Technical User

ogniemi

Technical User

petertp6

Vendor

ogniemi

Technical User

sbix

IS-IT--Management

ogniemi

Technical User

sbix

IS-IT--Management

ogniemi

Technical User

sbix

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor