Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Telnet issue throughout LAN estate 1

Status
Not open for further replies.

Malcybood

Technical User
Nov 11, 2003
24
GB
Very strange issue here that's got me scratching my head big time. Not service affecting, more of an annoyance!

Background: 2 years in this job, always been able to access Passport core switches x 2 via telnet and device manager and been able to access 35 switch stacks which are accessible over the MPLS WAN (mix of 460 & 470's) via telnet, device manager and web interface.

Issue: Can not telnet to any of the passports or baystacks in the estate (times out, ran packet traces and no0 dropped packets just a timeout), but can connect via device manager to bays and passports and HTTP to baystacks no problem.

Recent changes: Started managing LAN estate in house, only changes that have been made are password resets on all baystacks but not passports. We set the local password then the telnet and HTTP passwords to use the local password.

No firewall between PC and any of the stacks.

If on the passports I disable the telnet server in the CPU properties - boot menu, then apply it, then re-enable it I can telnet to the device for about 15 minutes and it then starts timing out again.

If I reboot a 460/470 switch stack or reset the adminstate in device manager, same thing where straight away I can telnet to the device and 10-15 minutes later it starts timing out again.

The only other thing that has changed is one of our Passport blades (8616GTE) blew a couple of weeks ago and was hot swapped and the device has not been rebooted since. This and the password changes are the only changes that have been made and it's becoming a bit of a pain.

For the record we have installed 2 x stacks of 4525T-pwr at new MPLS sites and have no issues at all "telnetting" to them.

I know the software can be buggy on these stacks from past experience but really am a bit stuck here and don't fancy upgrading 35 stacks if I can avoid it. Anyone have any ideas?

Let me know if any more info is required.

Cheers!
 
HmmHmm... that's an interesting problem. You mention "password" changes but it sounds like you were previously using RADIUS authentication?

What version of software are you running on the 470s?

What version of software are you running on the 8600s?

When you serial up does the log reveal anything? How long have the switches been running (uptime)? There are some known issues that can cause some issues after 497 days of uptime on the Ethernet Switch (formerly BayStack) products.

Cheers!
 
Thanks for the response.

RADIUS was not used when we were on a managed services contract either. I spoke to the 3rd line guy I used to deal with and he used to login to each switch remotely via telnet and change the password in the console/configuration menu, which is how I done it also.

The software versions are mixed between the baystack estate but mainly 3.6.06 or 08

I am getting one of these units returned in the next couple of weeks so will see where I can get to in the lab with it and post back.

Although I have rebooted the switch through device manager and reset adminstate which temporarily resolves, I have not fully powered down any of the stacks.

The Passports have not had their passwords reset either. I usually use DM to administer them and I only wanted to telnet to it because I can get some routing info from it easier at CLI than DM so I'm not 100% sure when it stopped working, all I know is the 8616GTE module blew between it working and not working although may be a red herring!

I'm reluctant to power down the Passport following the 8616GTE failure, even in down time window as it hosts some servers which are a bit "flakey" lets say and are not dual homed and some other business critical services.

Passport software is way out of date 3.5.3, but something I'm working on getting a handle on for the passports and bays!

I will post again in the next week or so with my findings from the returned switch!
 
Any chance there is something unrelated blocking telnet - like a firewall? I guess that doesn't explain why it works sometimes, but it might explain the widespread nature of the issue.
 
99.999% sure, I mentioned there is no firewall between the PC and the stack in my first post and that includes host based firewalls - we don't use them, only macafee but not relevant here.

Also I can always telnet to the 4526T-pwr stacks we've recently installed over the MPLS and I built a new stack for another new site yesterday in the lab and had no issues with telnet to it.

Really really strange. I am going to play with the unit that is getting returned next week in the lab.

Have a feeling I will need to upgrade the code on all stacks as I've looked at everything in the config, but one step at a time.

It isn't a major issue but CLI is good for checking qos stats etc.
 
It might be a code issue with the BayStack products but definitely not the Passport (Ethernet Routing Switch).

If it's possible I would setup a sniffer on the edge switches to confirm that your telnet (TCP/23) packets are actually making it to the switches.

Do you have a single VLAN for your management interfaces? Perhaps there's something on the network that is causing a DoS and exhausting the TCP connections on the switches? And Device Manager works because it using UDP/161.

If you had a modem attached to the console port of the Passport I could give you some commands to run while the problem is occurring.

There's definitely something not right there...

Good Luck!
 
OK folks, I found the issue!

Apologies to Nortel for accusing their dodgey code :p

DaddyOfThree,

The penny dropped when you mentioned that it sounded like a DOS attack on port 23. I remembered that last month we had some evaluation network monitoring software in a virtualised environment that we had setup to do a network discovery on the management subnet (172.106.17.0 - 248 /29)but it also requires you to associate the service with it, in this case port 23 is open on all LAN switches as they're on a private WAN.

It does appear this discovery was still running or maintaining the TCP 23 connections even though we had stopped the discovery in the software.

This explains why the new stacks on the network did not have an issue as they were not being "discovered" and port 23 being blocked.

Just to verify when I got back into the baystack I ran the show telnet-access command which showed the 15 minute timeout, this combined with the discovery on the network management software running every 10 mins resulted in the issue!

So just to confirm this is not an issue with passwords or Nortel code. It was to do with a network disovery hogging all TCP 23 connections.

I am going to have to reboot every stack out of hours remotely, as it's now an hour or so since I killed the management software and I reset the adminstate at one site which is empty today and reset the telnet server again on the passports which are both fine, but the rest of the estate still unable to telnet. I think they just need a gentle "kick" to clear the connections, but not too worried about doing this. Prefer it to having to upgrade the code on every stack.

Thanks for your help and hope it helps someone else if they have something similar!

Cheers
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top