Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

No RMC connection between the HMC and the RPA

Status
Not open for further replies.

khalidaaa

Technical User
Jan 19, 2006
2,323
BH
Hi all,

I had a problem yesterday with one of my HACMP clusters when i increased one of the filesystems (it failed over to the standby node) I just stopped and started the cluster and ever thing was back to normal!

I just came today to do a DLPAR to the primary node and i discovered this error

Code:
HSCL2957 There is currently no RMC connection between the HMC and the RPA partition {0}. This could be because the partition is inactive, a firewall, or an improper network setup causing RMC not to recognize the partition. Please check the system network setup

I'm sure that the network connectivity is ok as i did a failover and a fallback yesterday to the resource group with no problems. I know it is something to do with the rsct group! But i can't do

/usr/sbin/rsct/bin/rmcctrl -z
/usr/sbin/rsct/bin/rmcctrl -A

to refresh the rsct as this will upset the HACMP cluster!?!?

Any idea what is the specific service from that group of rsct that i should stop and start to be able to do the DLPAR with no problem!

Here is the list of the rsct group in my primary node:

# lssrc -a | grep rsct
ctrmc rsct 249882 active
IBM.ServiceRM rsct_rm 151594 active
IBM.HostRM rsct_rm 229612 active
IBM.ERRM rsct_rm 262396 active
IBM.DRM rsct_rm 135394 active
IBM.CSMAgentRM rsct_rm 225498 active
IBM.AuditRM rsct_rm 221264 active
IBM.LPRM rsct_rm 163842 active
ctcas rsct inoperative

Any help is appreciated.

Thanks

Regards,
Khalid
 
It look like problem with your HMC internal network.

Do you have access to your HMC. But it would be strange if it affected cluster.

what does
/usr/sbin/rsct/bin/lsrsrc IBM.ManagementServer

return?
 
Thanks for the reply ogniemi.

I managed to do a DLPAR for another LPAR with no problem so i don't think that the access to the HMC is the problem!

I know it might not affect the cluster but i thought of sharing with you guys as this is a prodution system!

Code:
# /usr/sbin/rsct/bin/lsrsrc IBM.ManagementServer
Resource Persistent Attributes for IBM.ManagementServer
resource 1:
        Name             = "128.10.1.110"
        Hostname         = "128.10.1.110"
        ManagerType      = "HMC"
        LocalHostname    = "128.10.1.150"
        ClusterTM        = "9078-160"
        ClusterSNum      = ""
        ActivePeerDomain = ""
        NodeNameList     = {"proddb"}

Regards,
Khalid
 
I just call IBM and they advice me to reboot the machine!

I don't think that this is a good solution! I expected a better one!

I arrange for dowmtime today :(

Regards,
Khalid
 
hmmm..

Recovery action to your HSCL2957 says:
1. Ensure that the network connection is working between HMC and the partition. For details, see Task 4. Ensure that your physical networking is set up correctly.
2.If the problem persists, contact your next level of support or your service provider.

got from:

Maybe your problem is already gone?

Login to your HMC and check last events for you managed system:

lssvcevents -t hardware -m the_managed_system_name -d 1 -F problem_num:status:last_time:text


the managed system name you can find running command:

lssyscfg -r sys -F name
 
I'd try rebooting HMC first, see where that gets you - not as drastic as rebooting the server or lpar...

also, try restarting ctcas subsystem

startsrc -s ctcas

(I know, I know -- one shouldn't use start/stopsrc commands no rsct subsystems...)


HTH,

p5wizard
 
ogniemi Yeah i saw the describtion of the error

I ran this command:

lssvcevents -t hardware -m the_managed_system_name -d 1 -F problem_num:status:last_time:text

(with the the_managed_system_name as my server name)

and returns (No results were found)

p5wizard (you made me laugh :)

I arranged for system down time (after an hour from now which is 3:30 my time) already but i will put that in mind in case the problem was not fixed.

Code:
ctcas 
Is for security verification. It is a lazy started resource manager and does not have to run in order for DLPAR to work.

That's the describtion i found for this service so i don't that this will do the bizz. would it?

Thanks for your comments guys.

Regards,
Khalid
 
I still can't really tell why but after rebooting the machine every thing is ok now!

Thanks for the help guys

Regards,
Khalid
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top