Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

CentOS ARP Cache Problem

Status
Not open for further replies.

RobBentley

IS-IT--Management
May 10, 2003
199
Hi All.

We use a CentOS 4.4 box as a BGP router and it seems to be suffering something strange with regards to the ARP cache.

The cache appears to get filled with entries for public subnets we don't own and thus are not even assigned to local adapters - all entries appear as 'incomplete' (see below for a very small selection):

194.48.242.107 (incomplete) eth1.636
194.49.219.13 (incomplete) eth1.636
194.49.218.173 (incomplete) eth1.636
194.49.218.73 (incomplete) eth1.636
194.49.218.87 (incomplete) eth1.636
194.49.218.93 (incomplete) eth1.636

Only happens on this one VLAN too which is strange.

Any pointers as how the hell they'd even get there when they're not even local to us would be appreachiated.


Regards,




 
As a side note, the problem goes away for small periods of time (10-30 minutes) and then comes back with completely different subnets.

 
What are you using for routing software? Zebra, xorp, quagga?
May need to see your config. Are you using proxy arp by any chance?
 
Quagga.

Its a very simple config - couple of BGP feeds in on eth0 (each VLAN'd from switch) and a couple of subnets VLAN'd out on eth1.

Very basic - dont seem to have the trouble on either of the other two installs - which made me wonder if a machine on the network could be doing something odd.

 
Have you tried manually flushing the arp cache, running a packet trace against traffic originating from the vlan id in question and simultaneously snarfing the bgp and arp traffic?
Something like (untested):
tcpdump vlan xxx and tcp dst port 179 or arp and src net 194.0.0.0
 
Also check your netmask to see if you are being too inclusive with your addressing on the BGP interface.
 
I'll look at the other stuff in the morning, cheers. The netmasks I know are correct - which begs the questions... how does it even get incomplete arp entries for addresses which are non-local. Thats the bit that worries me.

 
The kernel and/or quagga is obviously confused about directly attached networks to be trying for arp resolution, agreed.
Just a matter of chasing it down now.
 
Taken a few steps....

1. Removed all no longer existant VLAN interfaces from Zebra (of which there were a good few)... (conf term, no interface eth0.xxx etc etc)

2. Removed GATEWAY=X.X.X.X (basically itself) from /etc/sysconfig/network - as it doesnt appear to be on either of our 'OK' boxes.

3. Restarted the network service (service network restart).

The problem, for now, appears to have gone away - what fixed this, and if it will be back - we shall see.

Its very obvious when it starts happening as every server on the VLAN in question see's 200-300Kbps worth of traffic hitting its NIC 24/7 (shown on our graphs) which I assume is the ARP'ing.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top