Network issues with Network Load Balancing, NLB, on Windows Server 2003, 2008

stevegravley · Feb 21, 2013

I recently solved an issue that we had been working on here for a while, and I wanted to share the solution.

We have been using a two node NLB with Windows Server 2003 and also one in Windows Server 2008 R2 in unicast mode. Because of how NLB works, the mac addresses of the cluster IP's were not being kept in the mac address table on the Catalyst switch we use. Because of that, all of the traffic headed back to the NLB server was broadcast on the subnet. Since our phone system was on that same subnet, an Avaya Ip Office system, the phone were starting to get network delays. Users saw delays on their calls and also had some choppy calls.

One solution was to move the NLB to another VLAN. That worked well for the 2003 server.

Another solution, because I didn't want to move my 2008 server, was to setup a static mac-address route on my Cisco switch. I found the mac address of the NLB cluster ip addresses (not the LAN address, the cluster IP). Then logged into my Catalyst, a 3560G in my case, and entered this command.

Code:

conf t
mac-address-table static 043f.0a01.06ed vlan 2 interface GigabitEthernet0/18 GigabitEthernet0/19

Notice, the MAC is the same for both NLB servers. The ports are the two ports that the servers are plugged into. The vlan is the same one the servers are on.

This command lets the switch know where that MAC address was headed and prevented it from spamming all the other ports on that vlan with the NLB traffic. Our phone system could breath again.

ADB100 · Feb 27, 2013

This behaviour in Unicast mode is by design. If you capture an ARP reply from the servers sharing the IP address you will see the MAC address is a phantom MAC that you will never actually see in the switch CAM tables (in your case 043f.0a01.06ed). When a packet is forwared at layer-2 to the host the layer-2 switches don't have a CAM entry to explicity forward the packet to and therefore flood the packet to all ports in the broadcast domain (VLAN) and if you have the same VLAN spanning multiple switches then it will be flooded to them to.
You can add static CAM entries to the switch(es) like you have to work around this, however physically moving the hosts or migrating the service to other servers (or VMotion'ing the servers) requires some manual configuration on the switches. If you have the VLAN spanning several switches then each of these switches will require the static CAM entries on thier inter-switch links as well.
I would investigate NLB using multicast & IGMP as this eliminates the flooding behaviour (assuming you have IGMP snooping running on the VLAN. Its really about network design and you have hit a common issue with MS's NLB.

I have seen this many times and to be honest its a lack of understanding/interaction between server guys and network guys and the network design/topology. I once had to troubleshoot an issue where the backup service was clustered and each night effectively a server to server copy where each server was connected with 1Gbps NICs (a backup) was flooded to the entire network and anything connected at less than 1Gbps had no connectivity and devices with 1Gbps NICs were intermittent.

Having your voice servers on the same VLAN as other servers is a bad idea - create a voice server VLAN, or better still several VLANs, don't span VLANs between switches and don't have users on the same VLANs as servers.

Andy

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Network issues with Network Load Balancing, NLB, on Windows Server 2003, 2008

stevegravley

MIS

ADB100

Technical User

Similar threads

Part and Inventory Search

Sponsor