Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

IST Issues

Status
Not open for further replies.

marshyrob

Technical User
Jan 20, 2004
137
GB
Hello

We have an IST between 2 passport 8600's running 3.7.3 code.

We experienced an outage whereby one of the 4 links that make up the IST was at 31% utilisation and the other 3 were at 1%. We started to loose servers on only one particular VLAN (6). After we stopped the overnight backups the utilisation fell to less the 5% on link 1 and all the servers started to respond again.

This is strange

1. Why would 31% utilisation on one of the links start to impact servers

2. Why didnt it load balance the traffic across all 4 links

3. Why was only VLAN 6 affected and no other (there are many VLANS running across the IST).

Has anyone seen this behaviour before, and if so what was the course of action to rectify the problem? Or does anyone have any suggestions?

Many thanks

Rob
 
Is the IST trunks on its own VLAN ? If not Nortel recommends that the IST trunk be on its own vlan with CP-limt disabled.
 
Depends on the servers, can you describe how the servers were affected? Could you ping them at all, were they up but so slow as to be unusable, ect.

MLT doesn't load balance in a round-robin fashion, it does it based on source and destination addresses - so a given conversation will stay on its particular link. If you're unlucky enough that several high-bandwidth conversations fall into the same address range they will compete directly with each other. Its not a perfect scheme but given enough source-destination combinations it works well.
Customer Support Bulletin CSB-0109002 shows the math used in some detail.

As ecuamer mentioned your IST peer addresses should be in their own VLAN with CP-Limit, STP, and anything else that could shutdown that link turned off.
 
Thanks for your reply guys.

We couldnt ping the servers at all, but on further investigation it looks like the servers themselves where having the issue. They were doing backups and the 8600 ports that they were plugged into were nailed at 100% utilisation. (100mb fduplex)

The only real funny (after understanding the load sharing across the IST) is why were only servers on the 6 vlan affected? The IST trunk is on a VLAN shared by other devices, but its not the 6 VLAN which was the one with the funnies. Should we not have experienced issues with other devices?

Thanks for you help, its appreciated.
 
The VLAN 6 bit doesn't make a whole lot of sense, could you show us the important parts of your config to ensure we're all on the same page?
 
Multi-link trunks do not load share on a conversation basis. A server being backed up by another server will use the same trunk link for the entire transaction.

I beleive that the trunk selection on an 8600 is based on source destination IP address.

This would likely explain issues brought up in 1, 2 & 3.

The IST configured across a MLT primarily allows for L2 FDB synchronization and enables the IST'd core to SMLT. The MLT still does MLT stuff.
 
It makes the decision based on the source/destination MAC address.

 
Nortel says the source destination hash is determined by and MAC algorithm but my experience shows its based on odd and even IP addreses. I had a dirty piece of fiber and it caused intermittent communications on half off my servers the odd half. That was a fun one to find, luckily it was on a new install so we could play with it.
 
Our SE once told me that in a purely routed connection (like a backbone link in a layer 3 mesh) the 8600s will use source and destination IP addresses to avoid having all of the traffic go on one link - since there are only two MAC addresses on such a subnet. I haven't seen it documented, based on our experiences it seems to do the same kind of hash based on the number of links in the MLT.

We'd had a similar problem to djpeterson, although our issue turned out to be a hardware problem on one of the 8600 modules. djpeterson: did you get any error counters? We didn't, just lost packets from a random 1/8th of our users on the other side of the link. :-(

 
Different devices use different MLT algorithms. You can check Nortel docs on MLT for each device, for example layer switched and routed traffic has different MLT algorithm

peace
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top