Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

HACMP Clstrmgr

Status
Not open for further replies.

smith364

IS-IT--Management
Jun 26, 2001
34
US
I mistakenly removed the groups hacmp and haemrm. HACMP uses these groups. I re-enterd the groups and checked the file permissions on the files. The clstrmgr will not start. Also the machine crashes when attempting to start clstrmgr. Does anyone have any suggestions as to what I should be looking for?
 
I would check if clcomdES is running on cluster nodes - and even restart it on both nodes.

lssrc -s clcomdES

If you removed groups on all cluster nodes and then recreated it there (with the same GID what was before) try to find nogroup files/directories on your cluster nodes.

find / -nogroup -ls

or

find / -nogroup -ls | grep -i hacmp

If files ownerhips are ok, then below commands run on all cluster nodes should exit with 0 and cluster should start as before groups removal.


# /usr/sbin/cluster/utilities/clnodename
CLNODE02
CLNODE03
# /usr/es/sbin/cluster/utilities/cl_rsh CLNODE03 /bin/odmget HACMPtopsvcs

HACMPtopsvcs:
hbInterval = 1
fibrillateCount = 4
runFixedPri = 1
fixedPriLevel = 38
tsLogLength = 5000
gsLogLength = 5000
instanceNum = 8
# /usr/es/sbin/cluster/utilities/cl_rsh CLNODE02 /bin/odmget HACMPtopsvcs

HACMPtopsvcs:
hbInterval = 1
fibrillateCount = 4
runFixedPri = 1
fixedPriLevel = 38
tsLogLength = 5000
gsLogLength = 5000
instanceNum = 8

(I guess cluster topology and resource groups are syncd).

btw. although cluster is not running at the moment, to cleanup state I would recommend perform cluster stopping on both nodes (smitty clstop) - it will perform some cleanups.
 
Thank you for your help. I used your find command to see what files needed to be changed. Clstrmgr still crashes the system. When the system reboots the hostid, host name are not set. The system comes up witht localhost as the hostname. I never had a server go this far soth before. Any suggestions?
 
after reboot, check:

netstat -i
netstat -in

and compare it with:

/usr/es/sbin/cluster/utilities/cllsif

Check what is the hostname, default gw configured on inet0:

lsattr -El inet0

If it is not set then you lost some configuration and I would recommend to enter this configuration once again.

smitty hostname
smitty chinet (if network interface configuration is lost as well).
 
I have followed the above procedures. Everything checks out O.K. Cluster manager still crashes the system. I tried running everything manually, following the procedures that the successful node executes when running smitty clstart. I noticed the daemon gsclvmd will not start. Does this mean anything?
 
By the way smith, what kind of error messages do you see in /etc/hacmp.out?
 
what is you HACMP version?

when you removed mentioned groups was your cluster started with online resource groups?
 
First of all, thank you everyone for your responses. The version of HACMP is 5.3.4. I have noticed that a hacmp.out is not created. My error log (errpt) reports: Group services started. Software program error, clstrmgrES. clexit.rc - unexpected termination of clstrmgrES. Topology Services stopped by signal SIGTERM. topsvcs subsystem died with error = 16, grpsvcs wiil also exit. haemd 2521-032 Cannot dispatch group services. emsvcs fails. grpsvcs fails. These are with software errors. Then the system shuts down with a 1=halt.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top