Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Passport 8600 CPU @ 100%

Status
Not open for further replies.
Feb 1, 2001
116
US
This morning at someplace after 7am and before 7:45am, our Passport 8600 CPU went belly up. The unit has 4 48 port 10/100 Routing Blades and 4 8 port SX Routing Blades and 2x Routing CPU Blades.

The CPU blade that was in production at the time showed no signs of life, but most ports were still up (I said "most"). Many were disabled somehow when the CPU stopped responding. We could not Telnet in or Console in and the CPU meter on the blade was locked at 100%.

When I monitor the CPU usage with optivity, it shows around 10% to 20% usage on the average, but we get a 100% spike about 3 times and hour. We are running v3202 code.

Any thoughts of what may of caused this?
 
We have a dark fiber ring that runs between 7 8600. I was going to each 8600 and was saving the config to both Master and slave. After accidentally failing over 8600#2 a routing problem started. We found out that it appeared to be a rip issue by turning up the trace logs value on CPU cycle. The rip issue was continually getting worse and worse and nothing we tried seem to fix it. At it worse the CPU was spiking badly, almost like spanning tree was going nuts, and we could not always telnet to it. At the height of the problem I was plugged into the Default VLAN and still could not get to the box but our support could. They removed and reapplied the IP to the default VLAN and poof. Everything went to working.
Said all that to say try remove and readding the IP to your default vlan.

My suggestions are what I would try myself. If incorrect, I welcome corrections to my knowledge.
Scott
stomlin@baptistfirst.org
 
Interesting... did anyone ever say "why" they felt this would fix the problem? How do you turn up the trace logs? What makes you think this won't come back again?

BTW - Looking at your email address, do you work for the Baptist Home Office? 7x8600's? What in the world are you guys doing that needs that much power?

Hope you don't mind the questions... I'm always looking for information that may help us provide a better network for our customers.

Thanks

--
Dean Sullinger
Arizona Department of Transportation
Information Technology Group
Network Architecture Team
Email : dean@dot.state.az.us

 
Work for the Baptist hospital in Montgomery AL. We connected 2 hospitals and 4 primeds with the Admin office building. We have dark fiber between all sites. At the time the 8600 was the only device that did GBIC’s and we need the XD GBIC in order to reach our sites.

We are still waiting for an answer from Nortel as why this happened or why what we did fixed it. At the moment we are still waiting. In reality they probably don’t know why this happens.

We increased the trace level on the CPU only. Since our network is not huge we use RIP. The only thing that increasing the trace level did was show that many of our routes were in hold down. The sniffer captures did show that it was receiving the updates but, the CPU must have been managing something else sine it seemed they were not processed. We haven’t figured out what it was doing and we probably will never know what it was doing.

To turn up the trace level type trace at the prompt and follow the help screens. Here is the one we ran: trace 9 3

I have been told not to leave it on long for it will fill up the file system and I am sure that is a bad thing.

To view just type: trace info tail


I can’t say for sure that it won't come back but I do not think it will. I can only tell you that after a month of stress that is what fixed it. It was a problem that affected network performance and it made one of our apps to constantly drop and pickup its network connection to the clients. It grew to where some VLANs would work and others would not. Failing over the CPU made it better but did not cure it. It was going down hill slowly. The peak is when I plugged a notebook into the default VLAN and I could not ping the 8600 but, our remote support could. That is when they re IP’d the default VLAN.

Hope this helps!

My suggestions are what I would try myself. If incorrect, I welcome corrections to my knowledge.
Scott
stomlin@baptistfirst.org
 
Hi

I have had this prob and nortel flew engineers to site from Valbone France

Results: NIMDA virus! 30 000 arp request sec

DVMRP
IGMP excessive membership queries

Hope this helps,it did for us
 
Nortel NetworksTMTechnical SupportCustomer Support

BulletinNumber: CSB - 0109005 Released: 9/28/2001
Subject: Effects of the Code Red and NIMDA viruses on Nortel Networks Switch Products
Product: Model Functional REVISION Number(s)Product Name / Designation(s)Model/Order Number(s) Part Number(s) Potentially Affected Corrected Passport 1200 All v2.0.7.7 and above Passport 1100 All v2.0.7.7 and above Passport 8100 All Passport 8600 All BayStack 303 / BayStack 304 All BayStack 310 v1.6.0 and below v1.6.1 BayStack 350 None BayStack 410 None BayStack 420 None BayStack 450 None Business Policy Switch 2000 None Centillion 50 / 100 / 5000BH None Description: The purpose of this bulletin is to assist in minimizing the effects of the Code Red and NIMDA viruses on customer networks utilizing Nortel Networks switching products. These viruses have been reported to cause web server and management lockups in only a small number of Nortel Networks switch products, under specific conditions. In most reported cases of the Code Red virus, traffic flow through the switch was not disrupted. However, there have been numerous reports of the NIMDA virus causing traffic congestion on customer networks. Discus sion: The Code Red (and its variations) and NIMDA viruses exploit Microsoft Internet Information Server (IIS) software running on Windows 2000 and Windows NT machines and Microsoft Internet Explorer. The Code Red virus is spread when an infected server makes an HTTP connection on port 80 to a vulnerable server. The virus infects the vulnerable server and continues to seek out vulnerable servers to infect. The NIMDA virus has the capability of being spread through email, web pages, IIS attacks (similar to Code
--------------------------------------------------------------------------------
Page 2
CSB- 0109005 Nortel Networks Public Information 2 Red), and file shares. The random destination addresses for both viruses are determined by an algorithm that the worm runs. For more information about the Code Red worm, the Code Red successors (Code Red II, etc.), and a breakdown of the worm's functions, please visit: more information about the NIMDA virus and a breakdown of the virus's functions, please visit: Nortel Networks switch products will function normally when operating on a network infected with the Code Red or NIMDA virus. Problems arise when a high number of Microsoft IIS servers (or other infected machines) are present on the network, spreading the virus and generating packets destined for random IP addresses. This can cause excessive ARPs (the destination IP addresses may or may not exist on the network) and wasted bandwidth. If a network contains a large number of infected machines, this can cause Passport 1000-series (running v2.0.3 and earlier) and Passport 8600-series routing switches to experience high CPU utilization. There have also been reports of telnet, SNMP, web services on the Passport 1000-series routing switches not responding to valid client requests. This issue is caused by a problem in code releases prior to v2.0.7.7, where TCP and UDP connections established to the CPU are not properly released. Problems also arise when infected servers attempt to establish connections with Nortel Networks switching products on TCP port 80 (HTTP). This is seen in the BayStack 303, BayStack 304, and BayStack 310 switches. In some instances, the BayStack 310 will reset when the switch receives virus packets generated by infected servers. The virus will also cause the web server on the BayStack 303 and 304 to lock up, but will not affect data traffic passing through the switch. Devices which do not run web-related services (such as the BayStack 350, BayStack 410, BayStack 450 or Centillion products) are not affected. The NIMDA virus, in addition to attacking TCP port 80, attempts to spread itself over UDP port 69 (TFTP) and TCP port 25 (SMTP). While this may cause congestion and bandwidth problems, there have been no confirmed reports of Nortel Networks switches being adversely affected by TFTP or SMTP traffic from the NIMDA virus. The NIMDA virus is also capable of spreading when a user browses web pages on an infected web server, using Microsoft's Internet Explorer with JavaScript enabled. In addition, normal user traffic passing through Nortel Networks switching products was not affected except in the case of the BayStack 310 (due to the switch resetting), the Passport 1000-series (due to the high CPU utilization), and the Passport 8000-series (due to the high CPU utilization). Resolutions:The only permanent solution to these problems is to remove the offending viruses from the network. Customers with Microsoft IIS or Internet Explorer running on their network should apply the appropriate patches, available from Microsoft's web site at: If that is not immediately possible, the following steps may be taken depending on what type of Nortel Networks switching products are being utilized in the network:
--------------------------------------------------------------------------------
Page 3
CSB- 0109005 Nortel Networks Public Information 3 For Passport 1000- and 8000 -series routing -switches: For customers running v2.0.7.6 code or earlier on a Passport 1200 or 1100, it is recommended that the code be upgraded to at least the v2.0.7.7 agent. This is recommended due to a problem with prior version not properly releasing TCP connections made to the CPU. As more TCP connections were established on port 80, available memory and performance of the box would degrade. Customers using the Passport 1000-series or 8000-series products may disable the built-in web server by issuing the following command from the CLI: config web-server disable Disabling the web server will prevent the box from listening to TCP port 80. Doing this will prevent the CPU utilization from rising when the Code Red or NIMDA viruses are present on the network. This fix is meant only as an interim solution, and steps should be taken to remove and repair any infected servers. Disabling the internal web-server will not shut down port 80 on the box, but merely prevent HTTP connections from being made. It is also recommended to disable TFTP on the Passport 8000-series, as FTP may be used to transfer files to and from the box. This will help minimize the spread of the NIMDA virus. It is also recommended to setup filters, which will assist in blocking the spread of virus. Filters should be implemented in a way that will prevent the virus from spreading (i.e., ports between segments, floors, etc.) to various parts of the network. The following example for the Passport 8000-series switch shows how to block all incoming UDP port 69 (TFTP) requests, and incoming TCP port 80 (HTTP) requests directed to a server with an IP address of 10.38.6.17: config ip traffic-filter create global src-ip 0.0.0.0/0 dst-ip 0.0.0.0/0 id 1 config ip traffic-filter filter 1 action mode drop config ip traffic-filter filter 1 match dst-port 69 dst-option equal config ip traffic-filter filter 1 match protocol udp config ip traffic-filter create global src-ip 0.0.0.0/0 dst-ip 10.38.6.17/255.255.255.255 id 2 config ip traffic-filter filter 2 action mode drop config ip traffic-filter filter 2 match dst-port 80 dst-option equal config ip traffic-filter filter 2 match protocol tcp config ip traffic-filter global-set 1 create name "Block HTTP/TFTP" config ip traffic-filter global-set 1 add-filter 1 config ip traffic-filter global-set 1 add-filter 2 The filter is then applied on ports 1/1 through 1/16: config ethernet 1/1-1/16 ip traffic-filter create config ethernet 1/1-1/16 ip traffic-filter add set 1 config ethernet 1/1-1/16 ip traffic-filter default-action forward config ethernet 1/1-1/16 ip traffic-filter enable Notes regarding packet filtering: * Filters are applied to packets upon ingress to the switch.
--------------------------------------------------------------------------------
Page 4
CSB- 0109005 Nortel Networks Public Information 4 * Global filters are preferred, as they are more efficient than a source or destination filter. * Global filters are not supported on the Passport 1000-series routing switch, so a source or destination filter must be used. * Only routed packets or packets directed to the CPU are filtered on the Passport 1000-series routing switch. Bridged (layer 2) traffic through the switch will not be affected by a filter. * The process of enabling filters on multiple ports at the same time can be very CPU intensive. Customers using OSPF, MLT, VRRP, or with busy networks (average CPU utilization ~25% or higher) are advised to enable filters on no more than 16 ports at a time. Enabling filters on more than 16 ports at a time may cause your telnet or device manager session to lock up or disconnect. Since the NIMDA virus also spreads using it's own SMTP server, customers may wish to filter TCP port 25 (SMTP). However, network administrators should examine the possible downsides to filtering SMTP traffic on the network, as programs such as Netscape Messenger, Eudora, and Outlook Express have their own SMTP agent that will generate packets on TCP port 25. Customers may also wish to block UDP and TCP ports 137, 138, 139, and 445 (NetBIOS). NetBIOS packets should not need to leave the local area network (LAN) and therefore should only be blocked on ports leading off of the LAN. This action is recommended, as NIMDA is capable of spreading across file shares and mapped drives. For BPS 2000 and BayStack switches:Customers using the Business Policy Switch 2000 with v1.2.0 code or higher may disable the web server portion of the box while the virus is infecting the network. However, there have been no confirmed reports of the BPS 2000 being affected by either virus. Customers using the BayStack 310 are advised to upgrade to at least v1.6.1 code immediately. Software versions prior to v1.6.1 will cause the switch to reset when it receives large packets destined for the CPU (such as those packets generated by the Code Red worm). It is recommended that customers with a BayStack 303 or 304 in their network disable web services on these switches until the virus can be removed from the network. Doing so will prevent the web server on the switch from locking up, but will not prevent the switch from listening on TCP port 80. There have been no confirmed reports of the Code Red or NIMDA viruses directly affecting the Centillion 50, Centillion 100, System 5000BH, BayStack 350, BayStack 410, BayStack 420, or the BayStack 450 switches. In extreme cases, the packets generated by the infected servers flooded the network with unnecessary traffic, causing legitimate management traffic (such as telnet and SNMP) not to get through. The only remedy in this instance is to remove the virus from the infected servers and/or remove infected devices from the network. References:The following websites may provide additional information about the Code Red, Code Red II, and NIMDA viruses, as well as any patches that may be available for vulnerable software programs.
--------------------------------------------------------------------------------
Page 5
CSB- 0109005 Nortel Networks Public Information 5 CERT®Advisory CA-2001-19 "Code Red" Worm Exploiting Buffer Overflow In IIS Indexing Service DLL Information about the Code Red Worm Information about the Code Red II Worm CA-2001-26 Nimda Worm Information on the "Nimda" Worm SupportContact Information:Nortel Networks is committed to bettering the customer experience through its Customer TouchPoint Program (CTP) ­ where in most countries one number can be used to contact Nortel Networks. To obtain regional telephone contact information, please visit the following website:
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top