PP8600 - lost pings and high CPU Util

dane775 · Dec 11, 2008

Question...

I've noticed lost pings to VRRP interfaces lately and I don't ever remember having that problem. I've seen some issues pinging management interfaces and I know the switch places ping in a low priority queue..but this seems different. They're not just long responses (I extended response time to 8 seconds)...but actual lost pings. The losses occur regardless of how I come at the Passport (i.e. via multiple paths) and appear to get lost in the switch. They're not severe but they are consistent and we've had users complain about it. I'm wondering if we're stressing the switch these days.

Does anyone have any experience with lost pings to vrrp interfaces....and/or....can anyone offer an opinion on a threshold level where they would start to be concerned about CPU utilization?

I'm getting a case opened with Nortel now but I just wonder about the experiences you people may have had. i.e. everything ran fine until we hit X% cpu utilization after which we had problems. Or....nortel tech support once told me anything above X% is bad.

Anyone have any info along those lines??

DaddyOfThree · Dec 11, 2008

What utilization are you hitting on your CPU?

"show sys perf"

Cheers!

VinceWhirlwind · Dec 11, 2008

Check your interface stats for errors, as well, just in case it's the oldest problem in the book...

dane775 · Dec 15, 2008

Sorry so slow...out of a couple of days. I was seeing spikes that hit 85%. I don't ever remember seeing it that high before but it's not something I remember checking a lot.

I've seen it maxed out when it was having problems on older version of code but once that was fixed I seem to remember it was usually in the 20-30% range.

Thanks for the response,

Dane

anthonyanderberg · Dec 15, 2008

You might want to use MRTG, Cacti, or some other graphing tool to track your CPU use. The SNMP OID is:
.1.3.6.1.4.1.2272.1.1.20.0

dane775 · Dec 15, 2008

Anthony,

Good suggestion....thanks for the response.

Any comment on whether 85% is critical, scary, getting scary, etc.??

Dane

VinceWhirlwind · Dec 15, 2008

You tend to get those high CPU utilisations when there's some kind of storm going on.

I know I've had some ARP issues with these Nortels - how about you clear the ARP caches (core & edge) and see what happens?

dane775 · Dec 15, 2008

Vince,

I might give that a shot but I'll have to wait for a maintenance window...(Sat night/Sun morning).

Thanks for the suggestion!

Dane

DaddyOfThree · Dec 15, 2008

If you are consistently running at 85% I would say you should be concerned. What switch fabric are we talking about, 8690SF, 8691SF or 8692SF?

How big is your network? How many closets? How many edge devices? How many VLANs? Are you doing a lot of multicast traffic? How many OSPF interfaces, BGP interfaces, etc?

You might have a loop somewhere your network and rate-limiting (if enabled) might be preventing a complete collapse.

I have a two switch ERS8600 cluster with dual 8692SFs which averages around 7-8% with spikes to 19-23%. Around 8,000 end devices with 42+ closet switches/stacks totaling ~ 13,000 ports, 55 VLANs with OSPF, VRRP, SMLT, etc.

Cheers!

VinceWhirlwind · Dec 15, 2008

You might have a loop somewhere your network and rate-limiting (if enabled) might be preventing a complete collapse."

This is definitely what it looks like.

An easy way to test the theory is to pull the plug on one of your core switches. (Assuming you have 2 VRRP core switches with all downlinks attached to both cores).

TDN2007 · Dec 16, 2008

check your uplink ports in the core and on the edge switches for any changes, miner change may cause the uplinke mlt disabled which create serious loop. i have face that, the porblem was in one of edge switche mlt group,it was disabled because of unsemitrical vlan membership.

msatch · May 3, 2009

High CPU can cause high latency when the CPU has to do something like answer a ping, SNMP query. Typically it doesn't affect unicast traffic but if high enough it does.

Is CPU utilization high on both the VRRP routers?

We had a similar issue that was difficult to find and resolve but Nortel finally found the issue.

Normally our vlans are configured in a V configuration. New engineers had added an inverse U connection and also had added additional L2 distribution switches on new vlans. They had not disabled spanning tree on the ports to the distribution switches.

I disabled spanning tree on the infrastructure ports after verifying no VLAN loops had been created and utilization returned to normal.

andy88 · May 5, 2009

Try running a cpu trace when the cpu is high. This will show traffic that is hitting the cpu.

#trace level 9 3 leave it for about 5 seconds.
#trace level 9 0 to turn it off.

#show trace file or
#save trace file if you want to copy it off

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

PP8600 - lost pings and high CPU Util

dane775

Technical User

DaddyOfThree

Technical User

VinceWhirlwind

Technical User

dane775

Technical User

anthonyanderberg

MIS

dane775

Technical User

VinceWhirlwind

Technical User

dane775

Technical User

DaddyOfThree

Technical User

VinceWhirlwind

Technical User

TDN2007

Vendor

msatch

Technical User

andy88

Vendor

Similar threads

Part and Inventory Search

Sponsor