Multicast Problem (Cisco Cat6500's)

JJ1 · May 25, 2003

Hi All,

I'm having some problems with a particular application (called Isis) which sends multicast data between two machines. Every 5 minutes, the application will send a multicast 'heartbeat' between two machines. This is a business critical application and it works well most of the time, but for the last week, when it has been loosing it's heartbeat once every night - normally at 11pm and 3pm - which makes it all the more pressing to get fixed!

We have 3 IOS based Cat6500 routers in our core and two layer 2 catos switches near the servers running PIM spare-dense mode and IGMP snooping.

Packet traces near the source and receiver show that although the receiver is issuing IGMP membership reports, and the sender is 'publishing' data to the correct group, packets just aren't arriving at the receiver, and hence the heartbeat is lost. This is my evidence for some kind of network problem.

I have written a script to ask the routers for the following every 2 minutes:
>show ip mroute <src> <grp>
>show ip igmp group <grp>
>show multicast group 01-00-5e-xx-xx-xx

What I have observered is that very often, the leaf router, near the listener goes into a *,g state, instead of s,g. Could this cause a problem? Why does it move away from the s,g group every now and again?

When the listener finally keeled over at 3am ('great way to spend a Friday night!') I observed that the sender carried on trying to send messages to the mutlicast group.

However, the sender's router did not have an s,g entry. I thought that the router nearest the sender should make an s,g entry and send the messages to the rendez-vous point, but this was not happening. The sender continued to send, but packets were not being sent to the rendez-vous point. Are there any PIM gurus out there who could tell me if this is consistent with the PIM-spare mode protocol? Am I correct in saying that when a sender sends a packet to a multicast group, the first hop router should immediately create an s,g state?

Any help would be much appreciated.
Thanks!

James.

dnels · May 26, 2003

James,

With heartbeats every 5 minutes you will see the (S,G) entries disappear because of the 3 minute timeout. Heartbeats every 2 minutes would be a lot more efficient on the network side. You should see the (S,G) entry disappear in the RP also. But in the RP, the path to your destination should still appear in the OIL of the (*.G). If it isn't, that is a problem as the shared-tree has been broken and the RP cannot forward the mcast data to the receiver.

Do a 'sho ip pim rp' on the leaf router and check the timestamps. Are you losing contact with the RP? I have had bugs with 'auto-rp' if you are using that feature of sparse-mode.

You need to do a 'sho ip pim int' on the senders network to see where the DR is. The DR for the senders network is the router with the highest IP address. This router is responsible for initiating the Mcast stream to the RP. This is encapsulated if the RP has pruned the path back to the source.

So what is happening is:

Source sends a heartbeat and the multicast designated touter on the sources network encapsulates it and sends the RP.

If receiver has issued a join on the multicast group there should be a shared-tree back to the RP as seen in the OIL fo the (*,G).

The RP adds the source as an (S,G) entry to the (*,G)forwards the heartbeat down the shared-tree to the leaf router.

The leaf router send the heartbeat to the receiver and build a shortest-path tree to the source of the heartbeat. Since the heartbeat is every 5 mins this path begins to get torn down and we need to start again at the beginning.

If the leaf router loses connectivity to the RP you will not get the heartbeats via the shared-tree.

Hope this helps,
Dave

JJ1 · Jun 1, 2003

Hi Dave,

Thank you for the very useful info. You're right, 5 minutes is a stupid interval to send keepalives, but the company who has written the application refuses to alter their code, so I think we'll have to accommodate them this time.

You mentioned that the (S,G) state would disappear after 3 minutes of no traffic. This is definately what is happening here. For a few minutes, we see the (S,G) state, then it flips back to (*,G) for another few minutes.

I thought that the SPT-Threshold was set by default to 0Kbps?

Since we use this default I would guess that traffic would need to fall below 0kbps before doing 'SPT-switchback' (and using the shared tree once againen). Since it's impossible to get -1Kbps, how does the router know when to 'switch-back' to the shared tree?

Thanks!

James.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Multicast Problem (Cisco Cat6500's)

JJ1

Programmer

dnels

MIS

JJ1

Programmer

Similar threads

Part and Inventory Search

Sponsor