Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

aix+hacmp: clstrmgr crashes on start 1

Status
Not open for further replies.

cettolox

IS-IT--Management
Aug 29, 2008
4
IT
Hi,

the clstrmgrES daemon crashes as soon as it is started.

I see it from the log:

lpar6# cat /tmp/clstrmgr.debug
Fri Aug 29 12:09:50 HACMP/ES Cluster Manager Version 5.3
Using ODMDIR=/etc/es/objrepos
Fri Aug 29 12:09:50 HA_DOMAIN_TYPE=HACMP
Fri Aug 29 12:09:50 ReadTopsvcs: called.
Fri Aug 29 12:09:50 GetObjects: Called with criteria:
Fri Aug 29 12:09:50 ReadTopsvcs: hbInterval = 1, fibrillateCount = 4, fixedPriLevel = 38, runFixedPri = 1 instanceNum = 20
Fri Aug 29 12:09:50 ReadTopsvcs: Calculated fixed priority is 39
Fri Aug 29 12:09:50 /usr/es/sbin/cluster/clstrmgr: Unrecognized argument '?'.
Fri Aug 29 12:09:50 die: clstrmgr on node 0 is exiting with code 2

The problem is the "unrecognized argument"... I cannot tell which argument is it talking about.

I read it could be caused by a name resolution issue, but I cannot investigate any deeper.

Any hint?

Thanks in advance,

/Stefano
 
What's in /etc/hacmp.out? Any thing useful?

You would be able to restrict the problem by chasing backward in /tmp/hacmp.out for the event that failed!

Is that a new cluster? Was it working before? What changes have you done lately?

Regards,
Khalid
 
In hacmp.out there is nothing useful. The cluster is new, it has been installe on the 1st of August and worked. Then I do not know what is happened: what I see now is that the clstrmgrES daemon (that should always be on) cannot start: it crashes 1 second after been started with the "unknown argument" that I described in the 1st post.

I found a post that seems related to my problem, but I cannot understand the answer. This is the post, maybe someone gets a hint.

Thanks,
/Stefano


Question
When i analyse diffrente message i think that is a DNS problem
Recorded using libct_ffdc.a
/usr/es/sbin/cluster/clstrmgr: Unrecognized argument '?'.
Unexpected termination of clstrmgrES
Halting system immediately!!!
Any idea?

Answer
i advise u that u update ur /etc/services first :
node1 @IP
node1_boot @IP
node1_stdby @IP
by this u'll bypass DNS
 
Hi

Have you created your cluster , does it sync without any errors ?

what does lssrc -g cluster ( report)

does errpt -a ( report any errors for clstrmgrES)


Are you using DNS ? if so can you temporarily not use it , if you think its a DNS problem and the start your cluster

Have you got the latest fixes for HACMP 5.3 ?


 
It seems I have solved the problem... i few minutes I'll describe what I did.

/Stefano
 
The problem was that somehow the clstrmgrES module was configured to start with an incorrect parameter: in the startup argument there was the "-d" switch but it was not followed by a number (that is the debug level).

I performed the following steps.

1 - check the parameters of the modules

lpar6# odmget -q "subsysname like clstrmgrES" SRCsubsys
SRCsubsys:
subsysname = "clstrmgrES"
synonym = ""
cmdargs = "-d"
path = "/usr/es/sbin/cluster/clstrmgr"
uid = 0
auditid = 0
standin = "/dev/null"
standout = "/dev/null"
standerr = "/dev/null"
action = 2
multi = 0
contact = 3
svrkey = 0
svrmtype = 0
priority = 20
signorm = 0
sigforce = 0
display = 0
waittime = 15
grpname = "cluster"



2 - redirected the output to file to see what happened

chssys -s clstrmgrES -o /tmp/output.log -e /tmp/error.log



3 - see the log to understand (and to find that the problem was related to the "debug" switch

lpar6# more /tmp/error.log
/usr/es/sbin/cluster/clstrmgr: A flag requires a parameter: d

lpar6# more /tmp/output.log
/usr/es/sbin/cluster/clstrmgr: Unrecognized argument '?'.
Usage: clstrmgrES [-d debug_level]
-d debug_level Set the debugging level
-f log length Set the max log length
-p priority Set the process priority
-v version Set the cluster version
-w wait Set the stabilization wait time


4 - delete the "-d" from the argument list

chssys -s clstrmgrES -a ""


5 - restart all: it works now.
 
Have a star for updating the problem!

Regards,
Khalid
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top