ADC

NetScaler appliance networking and VLAN best practices

A NetScaler appliance uses VLANs to determine which interface must be used for which traffic. In addition, NetScaler appliance does not participate in Spanning Tree. Without the proper VLAN configuration, the NetScaler appliance is unable to determine which interface to use, and it can function more like a HUB than a switch or a router. In other words, the NetScaler appliance can use all interfaces for each conversation.

Symptoms of VLAN misconfiguration

VLAN misconfiguration issue can manifest itself in many forms, including performance issues, inability to establish connections, randomly disconnected sessions, and in severe situations, network disruptions seemingly unrelated to the NetScaler appliance itself. The NetScaler appliance may also report MAC moves, muted interfaces, and/or management interface transmit or receive buffer overflows, depending on the exact nature of the interaction with your network.

MAC Moves (counter nic_tot_bdg_mac_moved): This issue indicates that the NetScaler appliance is using more than one interface to communicate with the same device (MAC address), because it could not properly determine which interface to use.

Muted interfaces (counter nic_err_bdg_muted): This issue indicates that the NetScaler appliance has detected that it is creating a routing loop due to VLAN configuration issues, and as such, it has shut down one or more of the offending interfaces in order to prevent a network outage.

Interface buffer overflows, typically referring to management interfaces (counter nic_err_tx_overflow):This issue can be caused if too much traffic is being transmitted over a management interface. Management interfaces on the NetScaler appliance is not designed to handle large volumes of traffic, which may result from network and VLAN misconfigurations triggering the NetScaler appliance to use a management interface for production data traffic. This often occurs because the NetScaler appliance has no way to differentiate traffic on the VLAN / subnet of the NSIP (NSVLAN) from regular production traffic. It is highly recommended that the NSIP be on a separate VLAN and subnet from any production devices such as workstations and servers.

Orphan ACKs (counter tcp_err_orphan_ack): This issue indicates that the NetScaler appliance received an ACK packet that it was not expecting, typically on a different interface than the ACK’d traffic originated from. This situation can be caused by VLAN misconfigurations where the NetScaler appliance transmits on a different interface than the target device would typically use to communicate with the NetScaler appliance (often seen in conjunction with MAC moves)

High rates of retransmissions or retransmit give ups (counters: tcp_err_retransmit_giveups, tcp_err_7th_retransmit, various other retransmit counters): The NetScaler appliance attempts to retransmit a TCP packet a total of 7 times before it gives up and terminates the connection. While this situation can be caused by network conditions, it often occurs as a result of VLAN and interface misconfiguration.

High Availability Split Brain: Split Brain is a condition where both high availability nodes believe they are Primary, leading to duplicate IP addresses and loss of NetScaler appliance functionality. This is caused when the two high availability nodes cannot communicate with each-other using high availability Heartbeats on UDP Port 3003 using the NSIP, across any interface. This is typically caused by VLAN misconfigurations where the native VLAN on the NetScaler appliance interfaces does not have connectivity between NetScaler appliances.

Best practices for VLAN and network configurations

  1. Each subnet must be associated with a VLAN.

  2. More than one subnet can be associated with the same VLAN (depending on your network design).

  3. Each VLAN should be associated to only one interface (for purposes of this discussion, a LA channel counts as a single interface).

  4. If you require more than one subnet to be associated with an interface, the subnets must be tagged.

  5. Contrary to popular belief, the Mac-Based-Forwarding (MBF) feature on the NetScaler appliance is not designed to mitigate this type of issue. MBF is designed primarily for the DSR (Direct Server Return) mode of the NetScaler appliance, which is rarely used in most environments (it is designed to allow traffic to purposely bypass the NetScaler appliance on the return path from the back-end servers). MBF may hide VLAN issues in some instances, but it should not be relied-upon to resolve this type of problem.

  6. Every interface on NetScaler appliance requires a native VLAN (unlike Cisco, where native VLANs are optional), although the TagAll setting on an interface can be used so that no untagged traffic leaves the interface in question.

  7. The native VLAN can be tagged if necessary for your network design (this is the TagAll option for the interface).

  8. The VLAN for the subnet of your NetScaler appliance’s NSIP is a special case. This is called the NSVLAN. The concepts are the same but the commands to configure it are different and changes to the NSVLAN require a reboot of the NetScaler appliance to take effect. If you attempt to bind a VLAN to a SNIP that shares he same subnet as the NSIP, you get “Operation not permitted.” This is because you have to use the NSVLAN commands instead. Also, on some firmware versions, you cannot set an NSVLAN if that VLAN number exists using add VLAN command. Simply remove the VLAN and then set the NSVLAN again.

  9. High availability Heartbeats always use the Native VLAN of the respective interface (optionally tagged if the TagAll option is set on the interface).

  10. There must be communication between at least one set of Native VLAN(s) on the two nodes of an high availability pair (this can be direct or via a router). The native VLANs are used for high availability heartbeats. If the NetScaler appliances cannot communicate between native VLANs on any interface, this will lead to high availability failovers and possibly a split-brain situation where both NetScaler appliances think they are primary (leading to duplicate IP addresses, amongst other things).

  11. The NetScaler appliance does not participate in spanning tree. As such, it is not possible to use spanning tree to provide for interface redundancy when using a NetScaler appliance. Instead, use a form of Link Aggregation (LACP or manual LAG) for this purpose.

    Note: If you want to have link aggregation between multiple physical switches, you must have the switches configured as a virtual switch, using a feature such as Cisco’s Switch Stack.

  12. The high availability synchronization and command Propagation, by default, use the NSIP/NSVLAN. To separate these out to a different VLAN, you can use the SyncVLAN option of the set HA node command.

  13. There is nothing built-in to the NetScaler appliance default configuration that denotes that a management interface (0/1 or 0/2) is restricted to management traffic only. This restriction must be enforced by the end user through VLAN configuration. The management interfaces are not designed to handle data traffic, so your network design must take this point into account. Management interfaces, contained on the NetScaler appliance motherboard, lack various offloading features such as CRC offload, larger packet buffers, and other optimizations, making them much less efficient in handling large amounts of traffic. To separate production data and management traffic, the NSIP must not be on the same subnet/VLAN as your data traffic.

  14. If it is desired to use a management interface to carry management traffic, it is best practice that the Default Route be on a subnet other than the subnet of the NSIP (NSVLAN).

    In many configurations, the default route is relied-upon for workstation commmunication (in an internet scenario). If the default route is on the same subnet as the NSIP, then the ADC appliance can use the management interface to send and receive data traffic. This use of data traffic can overload the management interface.

  15. Also, an SDX-the SVM, XenServer, and all NetScaler instance NSIPs must be on the same VLAN and subnet. There is no backplane in the SDX appliance that allows for communication between SVM/Xen/Instances. If they are not on the same VLAN/subnet/interface, traffic between them must leave the physical hardware, be routed on your network, and return.

    This configuration can lead to obvious connectivity issues between the instances and SVM and as such, is not recommended. A common symptom of this is a Yellow Instance State indicator in the SVM for the VPX instance in question, and the inability to use the SVM to reconfigure a VPX instance.

  16. If some VLANs are bound to subnets and some are not, during a high availability failover, GARP packets are not be sent for any IP addresses on any of the subnets that are not bound to a VLAN. This configuration can cause dropped connections and connectivity issues during high availability failovers. This issue is caused because the NetScaler appliance cannot notify the network MAC ownership IP addresses change on non-VMAC-configured NetScaler appliances.

    Symptoms of this are that during/after a high availability failover, the ip_tot_floating_ip_err counter increments on the former primary NetScaler appliance for more than a few seconds, indicating that the network did not receive or process GARP packets and the network is continuing to transmit data to the new secondary NetScaler appliance.

NetScaler appliance networking and VLAN best practices