Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Andrzejek on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

PP8600 - lost pings and high CPU Util

Status
Not open for further replies.

dane775

Technical User
Oct 28, 2004
151
CA
Question...

I've noticed lost pings to VRRP interfaces lately and I don't ever remember having that problem. I've seen some issues pinging management interfaces and I know the switch places ping in a low priority queue..but this seems different. They're not just long responses (I extended response time to 8 seconds)...but actual lost pings. The losses occur regardless of how I come at the Passport (i.e. via multiple paths) and appear to get lost in the switch. They're not severe but they are consistent and we've had users complain about it. I'm wondering if we're stressing the switch these days.

Does anyone have any experience with lost pings to vrrp interfaces....and/or....can anyone offer an opinion on a threshold level where they would start to be concerned about CPU utilization?

I'm getting a case opened with Nortel now but I just wonder about the experiences you people may have had. i.e. everything ran fine until we hit X% cpu utilization after which we had problems. Or....nortel tech support once told me anything above X% is bad.

Anyone have any info along those lines??
 
Check your interface stats for errors, as well, just in case it's the oldest problem in the book...
 
Sorry so slow...out of a couple of days. I was seeing spikes that hit 85%. I don't ever remember seeing it that high before but it's not something I remember checking a lot.

I've seen it maxed out when it was having problems on older version of code but once that was fixed I seem to remember it was usually in the 20-30% range.

Thanks for the response,

Dane
 
You might want to use MRTG, Cacti, or some other graphing tool to track your CPU use. The SNMP OID is:
.1.3.6.1.4.1.2272.1.1.20.0
 
Anthony,

Good suggestion....thanks for the response.

Any comment on whether 85% is critical, scary, getting scary, etc.??

Dane
 
You tend to get those high CPU utilisations when there's some kind of storm going on.

I know I've had some ARP issues with these Nortels - how about you clear the ARP caches (core & edge) and see what happens?
 
Vince,

I might give that a shot but I'll have to wait for a maintenance window...(Sat night/Sun morning).

Thanks for the suggestion!

Dane
 
If you are consistently running at 85% I would say you should be concerned. What switch fabric are we talking about, 8690SF, 8691SF or 8692SF?

How big is your network? How many closets? How many edge devices? How many VLANs? Are you doing a lot of multicast traffic? How many OSPF interfaces, BGP interfaces, etc?

You might have a loop somewhere your network and rate-limiting (if enabled) might be preventing a complete collapse.

I have a two switch ERS8600 cluster with dual 8692SFs which averages around 7-8% with spikes to 19-23%. Around 8,000 end devices with 42+ closet switches/stacks totaling ~ 13,000 ports, 55 VLANs with OSPF, VRRP, SMLT, etc.

Cheers!
 
You might have a loop somewhere your network and rate-limiting (if enabled) might be preventing a complete collapse."

This is definitely what it looks like.

An easy way to test the theory is to pull the plug on one of your core switches. (Assuming you have 2 VRRP core switches with all downlinks attached to both cores).
 
check your uplink ports in the core and on the edge switches for any changes, miner change may cause the uplinke mlt disabled which create serious loop. i have face that, the porblem was in one of edge switche mlt group,it was disabled because of unsemitrical vlan membership.
 
High CPU can cause high latency when the CPU has to do something like answer a ping, SNMP query. Typically it doesn't affect unicast traffic but if high enough it does.

Is CPU utilization high on both the VRRP routers?

We had a similar issue that was difficult to find and resolve but Nortel finally found the issue.

Normally our vlans are configured in a V configuration. New engineers had added an inverse U connection and also had added additional L2 distribution switches on new vlans. They had not disabled spanning tree on the ports to the distribution switches.

I disabled spanning tree on the infrastructure ports after verifying no VLAN loops had been created and utilization returned to normal.
 
Try running a cpu trace when the cpu is high. This will show traffic that is hitting the cpu.

#trace level 9 3 leave it for about 5 seconds.
#trace level 9 0 to turn it off.

#show trace file or
#save trace file if you want to copy it off
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top