Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

BPS/Passport 8600 Partial Network Disconnect Problem

Status
Not open for further replies.

Googer

Technical User
Apr 30, 2004
60
US
I am having a problem with users on a BPS stack being partially disconnect from the network. By partially disconnected I mean that they can connect to one server on our server subnet but not another (Example, I can ping 192.168.1.23 but can not ping 192.168.1.44). I have checked all the client workstations settings and they look good. This is happening to all the clients on this stack. I have performed a packet capture at the destination of the failed ping and I can see Echo Request arrive and the Echo Reply leave, but the client gets no response. I have this stack uplinked through a Passport 8600 with a 2 port MLT. Both the 8600 and the BPSs are on a reasonably recent code. I have tried the following without success.

1) Unplugged one MLT member and then the other (no indications of a loop).

2) Reset the switch stack in software.

3) Powered off the stack and then back on.

4) Defaulted the BPS stack and completely reconfigured it.

5) Completely rebuilt the VLAN and MLT on the 8600.

I am back working now after completely replacing the stack with 450s and rebuilding the VLAN and MLT on the 8600 again. This is the second time I have had this problem.

Any help would be appreciated.

Googer
 
What version of code are you running on the BPS stack and the PP8600? I can check release notes for you.
 
Googer,

A few questions:

Did it work before?
Does it work when only one or the other mlt member is active?
Are there other mlt's active in the path from client to server?



InDenial

 
The software is 3.5.8 on the Passport 8600 and the BPS code is 3.0.3. I didn’t see anything in the release notes that might explain this but my version also says no known limitations for the 8600 release notes.

There are several other MLTs between the stack and the server segment, however: I have 25 other switch stacks on this 8600 that did not see the issue at the same time this stack was having a problem. If it where one of the other MLTs all clients being fed from all the stacks on the 8600 should have seen the problem.

Googer
 
The thing is that based on source and detination mac- or ip-address passports choose over wich member of the mlt the data goes.

It might be that one link is not working very well due to bad fibre wich is not noticed since all the other data goes over the working link. Maybe you could check if there is a fibre in the path that does not cary any data at all.

To eliminate the network and fibre lying between the client and server you could try to attach the client to another stack and see if it works there. If it works fine on the other stack then I think it is a client problem. If it still does not work you should compare a good working client and this one. Check over wich fibres the data travels when the data is send using a good client and when the data is send using the not working client.

InDenial

 
I checked all the fibers and see no errors. If it where anything other than a local fiber, I would expect all stacks connected to the 8600 to see the same problem. If it where one of the fibers to the switch stack it should have cleared up when I disconnected it and the stack was running over the other fiber.

The clients all work fine on the new stack. I don't think it is a client problem as all the clients on the stack had exactly the same problem. Moving the clients to another stack did work before I replaced the stack. The stack they moved to was also connected to the same 8600.

Thanks for your suggestions. Keep them comming.

Googer
 

This appears to be some sort of subnet mask overlap, incorrect gateway setting, or database corruption. The fact that you can fix it by replacing it with a stack of BS450's is what really points to a code issue. Or perhaps something you keep missing in that particular stack's config.

Here's some things to try and a few questions:

Check your ARP cache for the originating and destination IP addresses. Check the MAC associated with the IP's and make sure the PP8600 knows that the MACs go to where you've physically put them...(i.e. if you know server X is attached to 2/4 - make sure the 8600 knows the MAC for that device is associated with port 2/4).

Can you ping the end devices from the PP8600? Can you ping end to end from the console of the BPS or from one of the server's that can't be reached. You mentioned destination IPs, but nothing about the originating IP. Is any routing involved or is the originating IP in the same subnet? Have you double checked the VLAN and other settings on this stack with another stack of BPS's that's working?

When you say you did a capture...what do you mean? Pcap on the Passport or a Sniffer trace near the server?

This is an interesting problem. I can't wait to find out what it is. Wish I had access to your network or a copy of the PP8600 config.

Anyway...one last thought. Do you have access to a more recent version code for the BPS? Not saying that's the answer, just curious (at least you're not in the 2.x series, that stuff was terrible).

Good luck...Dane
 
The arp entries on the 8600 all looked good. I did not check the arp entry on the BPS stack. This was a routed destination which I could ping from the 8600 but not the BPS stack. The capture was a Sniffer trace on the destination server’s port using port mirroring.

The gateway settings for the client were all DHCP and were correct. The gateway on the BPS I checked and was correct also. I didn't see a subnet mask problem on the clients or stack but did not really check on the 8600 side.

I do have access to more recent code but upgrading is not nearly that simple. We have a significant number of BPSs in our environment and it is no small matter to get that large an upgrade approved. I could try that on a targeted basis though should the problem come back.

Thanks,

Googer
 
Any chance the gbic(s) for that link are flakey ?

Rick Harris
SC Dept of Motor Vehicles
Network Operations
 
Oops... did not mean so close post yet.
I had a gbic port switch from access to trunk on 8600 side.
Have not figured out how that happened.Did have a problem with a non Nortel gbic.

Rick Harris
SC Dept of Motor Vehicles
Network Operations
 
If it was a gibc type of issue it should have cleard when I was running on just one of the fibers and not the other. I tried both of the uplink fibers alone and neither running by itself fixed the problem. If it was a gbic type problem it should have gone away when I pulled that fiber. The strange thing is that I can't even replicate the problem in our lab using the exact same switch stack.

Googer
 
I have now seen this for a third time. This time it was a server on one 8300 was not able to talk to random servers on the other 8300. Playing a hunch I downed one of the MLT ports that uplinked the 8300 to the core and everything started working immediately. This also worked the second time when a remote office could not connect to the other 8300. I downed one link there as well and it came back up. I had our cabling vendor out to test the fibers and they look good. No errors are being reported by the 8300s or the 8600s on those ports. Even stranger, if I bring the ports back up a few hours later the problem does not come back. The IPs so the server on the 8300s is similar but not all those IPs that are similar are being affected. Also I have full connectivity to other IPs just one number off in the last octet and in the port next to the troubled server.

Googer
 
Could it be octopid related ?

Rick Harris
SC Dept of Motor Vehicles
Network Operations
 
have you also checked the layer 2 forwarding?

There is a problem where baystack, all of them, start spoofing mac-address when igmp is configured.
Then you have one way communication, but the passport sends it to the wrong switch.
So if you have no multicast running, disable igmp snooping and proxy on each vlan, and this on every stack.

We have looked after this problem quit some time
 
Octipid related how? I'm not familiar with that.

Unfortunately, we do multicasting on our network so I can't turn off igmp without making things worse not better.

Googer
 
Hi,

I have similary problem on the network of my customer :

1 Passport 8010
with
2 CPU
2 cards 32 10/100 Mb/S RJ45

1 stack
1 BPS2000
2 BS 450
3 BS 410
4 BS 410

DMLT 1/25 and 2/25
MLT PAssport 1/3 and 10/3

PC A in the stack
PC B not in the stack
PC C not in the stack

When the problem occured, if I use all connexions

B can ping A
C can't ping A

When I put out 1/25

B can't pîng A
C can ping A

When I put 1/25

B can ping A
C can't ping A

When I put out 2/25
B and C can ping A

When I put 2/25
B and C can ping A but other PC can't be ping

I have this problem before I use Passport and I think it's a problem with stack hybrid.

I can find what generate this problem because it's not all time. Have you an idea what can generate this problem

Thanks in advance.


 
The octopid is related to the backplane.There are 8 data paths on the backplane. If you had a 48 port TX blade,you would have 6 ports on each octopid. So for a trunk you would put 1 port per octopid(exanple:port1,port7...).
Hope that helps.

Rick Harris
SC Dept of Motor Vehicles
Network Operations
 
I am facing similar problem.

My baby configuration with 2 x Passport 1200 (MLT by 2G fiber cable), a stack with 5 x BS450 (MLT by 2G fiber to Passport 1200).

Already tried many steps like Googer did.

Today, I found the BS450's GE fiber port statistic having PAUSE FRAME count (more than 30,000) after the disconnection problem.

I am sure already reset all the statistic on last week.

May be that is the bad guy.
 
I have had this problem for the fourth time today. Once again I just dropped a port out of the MLT and it started working. It is worth noting it was a different MLT port that I dropped than last time.

Am I understanding you correctly that you want only one trunk port member per octopid? My trunk on the 8600 is ports 5/1 5/2 6/1 and 6/2. I'm not sure how the octopids are assigned on these 8300 cards.


Thanks,

Googer
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top