Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

All IPMP interfaces are FAILED

Status
Not open for further replies.

neuralnode

Technical User
Sep 12, 2007
59
PL
Hi All,

What does it mean that all interfaces from a given IPMP group are in state FAILED?
How can I enable them?


e1000g0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2
inet 10.42.69.20 netmask ffffff00 broadcast 10.42.69.255
groupname ipmp1
ether 0:14:4f:86:c1:a8
e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 10.42.69.19 netmask ffffff00 broadcast 10.42.69.255
e1000g1: flags=19040803<UP,BROADCAST,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED> mtu 1500 index 3
inet 10.42.69.21 netmask ffffff00 broadcast 10.42.69.255
groupname ipmp1
ether 0:14:4f:86:c1:a9
e1000g2: flags=19040803<UP,BROADCAST,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED> mtu 1500 index 4
inet 10.42.69.26 netmask ffffff00 broadcast 10.42.69.255
groupname ipmp2
ether 0:14:4f:86:c1:aa
e1000g2:1: flags=11000803<UP,BROADCAST,MULTICAST,IPv4,FAILED> mtu 1500 index 4
inet 10.42.69.25 netmask ffffff00 broadcast 10.42.69.255
e1000g3: flags=19040803<UP,BROADCAST,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED> mtu 1500 index 5
inet 10.42.69.27 netmask ffffff00 broadcast 10.42.69.255
groupname ipmp2
ether 0:14:4f:86:c1:ab
 
BTW, the IPs from the failed group are not usable, i.e. I cannot ssh via those interfaces.

 
OK the way IPMP works is that it has the two test interfaces that UDP ping either the gateway IP (or up to 5 reachable nodes on the same VLAN, or hosts in the local routing table). If the gateway is not reachable OR is blocked from ICMP traffic then the intrfaces will fail and the floating IP will not be reachable.


GATEWAY
^
|
Ping Ping
/ Test-Intf1 Test-Intf2
|| X
FLOAT-IP <...>

Float-IP will be Attached to one or other of test intf's as long as it (Test_intf) can get a ping response from the gateway.

As soon as you can connect to the gateway your interfaces should change from Failed to online.

Its not really that simple though as if the default gateway is reachable when the server is booted and then becomes blocked (by a network rule) then you will not be able to bring the interfaces up unless reboot and allow the IPMP to use another method like pinging the best of 5 local @IP's on the same network VLAN.

You can see what you IPMP (inmpathd) process is trying to ping if you snoop on the interfaces for UDP traffic and it will show you where its trying to reach.

You can disable and enable the IPMP interfaces with specific commands but again its not that simple (depending on what settings you have in the inmpathd.conf file.

If your networks were working before then your problem is most def that the gateway is refusing a UDP ping response check that first.

If you need more detail I'll do my best....

Laurie.

 
You said all that when checking /var/adm/messages would likely show the problem?
 
@blarneyme ... yes correct, probably the var/adm/messages will provide an answer that many of us would understand but ..... I was trying to provide a high level process of analysis for "neuralnode" .... I find so many people that don't understand IPMP and its quite simple once you know the basics ;)

Laurie.
 
Thank you all, especially Laurie.

I do understand the basic concepts of IPMP, but I couldn't figure out why the entire IPMP group would fail, if the other can connect to the router with no problems, and both groups are in the same VLAN.

However, my innate laziness prevented me from looking in the messages, as some of you suggested. Here is what I found:

Dec 27 15:54:08 lotus4 e1000g: [ID 801725 kern.info] NOTICE: pciex8086,105e - e1000g[2] : Adapter copper link is down.
Dec 27 15:54:08 lotus4 in.mpathd[658]: [ID 215189 daemon.error] The link has gone down on e1000g2
Dec 27 15:54:08 lotus4 in.mpathd[658]: [ID 168056 daemon.error] All Interfaces in group ipmp2 have failed


This, IMHO, indicates some configuration or hardware problems on the switch, as I don't expect anyone to physically remove the cable.

Would you agree with my diagnosis?

--
 
Indeed your analysis sounds correct, Another thing is to see if the interface speed on your interfaces matches that of the switch port.

Maybe the switch port is up but someone changed the speed/duplex this could prevent your interface from negotiating.

We nail our interfaces up to 100 full Duplex unless its a cluster interconnect (directly connected between nodes) then we leave it at Gigabit ..

Anyway the process (for reference) to up and down an IPMP interface is as follows:

Depending on how you have your mpathd.conf set-up

############################
Code:
Subject: OFF-LINE / ON-LINE a IPMP Interface 

# if_mpadm -d e1000g2

Brings Interface ce8[:n] OFFLINE" ... that has the effect of pushing the Float IP across to any other working interface that is a member of the same IPMP group OK 

BUT (depending on your config) then when you try to bring that interface back on-line with

# if_mpadm -r e1000g2

It will throw the error:

"Offline cannot be undone as failback has been disabled."
 
(our config says FAILBACK=no to save flip-flop) So to handle this you do ........

1) vi /etc/default/mpathd and change FAILBACK=no to FAILBACK=yes

2) ps -ef | grep mpath to find this  ...

 root 28267  1  0 17:48:38 ?   0:00  /usr/lib/inet/in.mpathd -a

3) kill -HUP 28267

# if_mpadm -d e1000g2

You Intf should now be back on-line ..
 
Then restore the mpathd file to FAILBACK=no and HUP the process again.

I assume you have a simple script to test your interface speed?

Laurie





 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top