Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need to reboot your Nortel switches every 497 days

Status
Not open for further replies.

curtismo

MIS
Dec 4, 2006
699
US
This bulletin just came out this week.

The title is "Ethernet Routing Switches: SysUpTime approaching 497 days can cause the switch or stack to behave in some unexpected way." Supposedly this affects BPS, ES 470 and ERS 55xx switches.

Can't say much more. Seems to be a little problem with Nortel switches. I've experienced this with the 1424T switches if you didn't reboot them before 50 days (and if you looked at the SysUpTime close to the day, it would be in the 49x day range). I believe the 1600 switches had a similar issue.
 
BULLETIN ID: 2008009145, Rev 1, for those with access to the Nortel website.
 
Nortel has known about this issue for a very long time. It's really disappointing to see how they are handling it.

They put out a bulletin and a video to "help" folks. The video is a joke and tells folks to "reset" their switch to avoid the problem. The video doesn't mention that resetting the switch will cause a network outage while the switch restarts.


Cheers!
 
It would be nice if they would fix the root cause of the problem, more than likely a memory corruption when the SysUpTime rolls over. It can be done; I know they've fixed the 1424T code years ago, and I think before that, the Passport 1600 routing switches had the same issue.

I know the BPS is EOL, and basically, the 470s are too, but, I agree - come on, haven't you learned that with the 5500 switches yet?
 
Lots of systems have a similar issue when their 32-bit counters roll over - 2^32 milliseconds yields 497.1 days. My Linux systems did before getting 64 bit counters... of course they didn't blow up, they just rolled back to zero.

Daddy^3 - you mentioned 460s in your blog but I don't see them listed in the bulletin, were your 460s part of affected 470 stacks?

I'm assuming products in the 425 family are not affected, mine started showing negative numbers in the CLI when the counters got that high, but the switches continued to run fine. ::)

This is a major thing for some of us and rebooting is a horrible 'fix'. I'm in high-volume manufacturing and haven't had downtime for some of my edge switches in several years.
 
An update: a Nortel source has said that 5500s will continue to forward traffic, but that switch management will be unreachable for about 5 minutes as counters wrap in the OS. I'm not sure about other switches, what have you guys seen?
 
I believe we've seen it hit the 460 switches as well. With all cases the management stops responding and it appears as if the switch restarts itself and then the management will recover. In actuality the switch will continue to forward L2 traffic but certain L3 functions/features will be unavailable until the "management stack" restarts itself.

Cheers!
 
I do enough code updates with my 55xx switches and with long power outages, UPS replacements, etc., in my network that they get rebooted long before the 497 day level. I have had a 5520 switch management "drop off" the network, long before the 497 day threshold - was able to have someone hook a laptop up to the serial port so I could reboot remotely (3 hour drive to reboot a switch was not my idea of a great day).
 
As an update - I had a whole site of 425s roll over the 497.1 day mark on Friday, their management all blinked offline for 10 minutes and then came back. No user traffic seemed to be affected but it was a scary time for my site people as their email filled with scary-sounding errors. Fortunately I remembered all of you fine folks and calmly told them that everything would be alright (while crossing my fingers.) :)
 
I have around 20 470/ 460 stacks in production. Mine do the same thing. Just the management IP seems to fall out. In this day of tight NOC monitoring and 24 x 7 staffing of our NOC I hate having to tell our overnight guys not to call me when a Nortel switch alarms as it is common. I cannot believe Nortel does not consider this to whittle away at their brand equity in larger operations when competing teams and ego’s that impact decisions (IE: Cisco/ Nortel battles). You can’t say much when a Cisco supporter pokes at the NOC alarms indicating Nortel switches going down but they are not really down, etc. followed by a comment about expecting this out of Netgear or Linksys. Small companies use Netgear, don’t have NOCs, can endure quirky issues and occasional outages. Enterprise operations are coached to use enterprise network gear, have NOCs that are staffed 24 x 7, cannot endure any downtime. Which does your Nortel sales and engineering team state they play in? Which does this simple but brand eroding side affect that Nortel has chosen not to take serious to address fit in? Nortel’s engineers and management is not the same group that brought us the SL1 PBX we have at our original old Corp. HQ location that once supported 800 stations with just one outage related to power, not the system in the first 20 years and now has 75 remaining stations and has not been down or had one quirky issue for over 6 years.
 
I have observed something probably related to that. Once, i telneted to BPS2000 and welcome message did not appear (i am sure that TCP connection was established), so i was unable to proceed to the main menu. Tried pressing all possible key combinations, nothing helped. None of the end users experienced network problems. Then i tried Nortel proposed solution :), and it worked.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top