Product Documentation

Troubleshooting

Aug 01, 2013

If the load balancing does not work as expected after you have configured it, you can use some common tools to access NetScaler resources and diagnose the problem.

Resources for Troubleshooting Load Balancing

Updated: 2013-08-01

For best results, use the following resources to troubleshoot a content switching issue on a NetScaler appliance:
  • Latest ns.conf file
  • Relevant newnslog files
  • Ethereal packet traces recorded on the appliance and relevant client, if possible
  • The ns.log file
In addition to the above resources, the following tools expedite troubleshooting:
  • A browser add-on tool that can display HTTP headers. This can be used to troubleshoot persistency related issues.
  • The Wireshark application customized for the NetScaler trace files.

Troubleshooting Load Balancing Issues

Updated: 2015-06-11

  • Issue

    I created a user script for monitoring, but it is not working.

    Resolution

    Check the number of arguments in the script. The limit is 512. A script with more than 512 arguments might not work properly. Use the nsumon-debug.pl script from the NetScaler command line to debug the script.

  • Issue

    I see a lot of monitor probes, and they seem to be increasing the network traffic unnecessarily. Is there a way to turn off the monitor probes?

    Resolution

    You can turn off the monitor probe connections, by disabling the monitor or setting the value of the healthMonitor parameter in the set service command to NO. With the NO option, the appliance shows the service as UP at all times.

  • Issue

    I have set up monitors for services, but connections are still directed to servers that are DOWN.

    Resolution

    You probably need to decrease the monitor probe intervals. The NetScaler appliance does not detect the DOWN state until the monitor sends a probe.

  • Issue

    A metric bound to the monitor is present in the local and custom metric tables.

    Resolution

    Add the local prefix to the metric name if the metric is chosen from the local metric table. However, if the metric is chosen from the custom table, you don’t need to add any prefix.

  • Issue

    The monitor probes to a service are not reaching the service.

    Resolution

    Check whether you have set a limit on the number of connections for a service. If yes, exempt monitor-probe connections from this limit by setting the monitorSkipMaxClient parameter to ENABLED.

  • Issue

    I am able to ping the servers, but the state of the services is always shown as DOWN.

    Resolution

    Check the type of monitors configured. For example, if a server is not configured for SSL and you use an HTTPS monitor, the state of the service is marked as DOWN. In this case using a TCP monitor should change the state of the service to UP.

  • Issue

    Setting a weight for load monitors does not help in deciding the state of the service.

    Resolution

    Load monitors cannot decide the state of the service. Therefore, setting a weight on the load monitors is inappropriate.

  • Issue

    A service is not stable.

    Resolution

    Consider troubleshooting the following components:
    • Verify that a correct server is bound to the service.
    • Verify the type of monitor bound to the service.
    • Verify the reasons for the monitor failures. You can open service from the Services page and verify the details for the number of probes, failures, and last response status for the monitor in the Monitors tab of the Configure service dialog box. To display the details, click the monitor configured.
    • If it is a custom monitor, bind a TCP or ping monitor to the service and verify the status of the monitor. If this resolves the issue, there is some problem with the custom monitor and the monitor requires further investigation.
    • You can record packet traces on the NetScaler appliance and verify the monitor probes and server response for further investigation.
  • Issue

    The virtual IP (VIP) address is not stable or its status is displayed as DOWN.

    Resolution

    Consider troubleshooting the following components:
    • Verify that the load balancing feature is licensed.
    • Verify that the feature is enabled.
    • Verify that an appropriate service is bound to the virtual server.
    • If the status of the VIP address is displayed as DOWN, verify that an administrator has enabled the service. If it is not, the status of the service should be Out-Of-Service. In such as case, you must enable the service and verify if the issue is resolved.
    • Verify the service(s) bound to the virtual server and complete the troubleshooting steps mentioned for service not stable issue.
    • If the VIP address is not stable, all the services bound to the virtual server should fail. Therefore, verify if all the services are failing at the same time. If it is so, there is a network issue between the NetScaler appliance and the servers.
  • Issue

    The site is experiencing uneven load balancing.

    Resolution

    Consider troubleshooting the following components:
    • Verify the load balancing method configured on the appliance.
    • Verify weights associated with the services are as expected.
    • If the load balancing method is other than round robin, verify the number of connections to the server logged in the newnslog file. You can run the following command to verify the number on the newnslog file:

      # nsconmsg –K <newnslog_file> -s ConLb=2 –d oldconmsg

      Verify the services for the specific virtual server and check for the Response time, Open Established connections (OE), Hits, Persistent Hits and persistent rate (P) to troubleshoot the issue further.

    • If the load balancing method is round robin, verify the persistent Hits as mentioned in the preceding step. Additionally, verify if the service is not stable. If it is not, complete the troubleshooting steps mentioned for service not stable issue
    • Verify if persistency is configured on the appliance.
    • Verify is any service is not stable. If yes, complete the troubleshooting steps mentioned for service not stable issue.
  • Issue

    The service status is displayed as DOWN.

    Resolution

    Consider troubleshooting the following components:
    • Verify whether a SNIP or MIP address is configured.
    • Verify that appropriate monitors are bound to the service.
    • If custom monitors are bound to the service, bind a TCP or ping monitor to the service and verify the status of the monitor. If this resolves the issue, there is some problem with the custom monitor and the monitor requires further investigation.
    • Verify if the status of service is displayed as DOWN for the server that is in another subnet. If yes, verify if Use Subnet IP (USNIP) resolves the issue because this could be due to the MIP address being unable to communicate to the server.
  • Issue

    There is an issue with the response time.

    Resolution

    Consider troubleshooting the following components:
    • Verify the server response time from the service stats either by running the following command:

      # nsconmsg –K <newnslog_file> -s ConLb=2 –d oldconmsg

    • Check for service not stable and service status being displayed as DOWN issues.
  • Issue

    One of the servers is serving more requests than the other load balanced servers.

    Resolution

    Consider troubleshooting the following components:

    • Verify the load balancing method. Use the round robin method to distribute the client request equally regardless of the load on the servers.
    • Determine whether persistence is enabled for the load balancing configuration. If persistence is enabled, a given servers might be carrying a heavier load to maintain its session, especially If the persistence sessions are long.
    • Verify whether weights are assigned to each service. Assigning proper weights helps in proper load distribution.
  • Issue

    Connections to a specific load balanced server are stalled. For example, all connections to one Outlook server might be stalled.

    Resolution

    Consider troubleshooting the following components:
    • Verify the load balance method. If it is round robin, consider changing the method to least connections.
    • Consider reducing the monitor time-out period. A shorter timeout period helps in marking a service as DOWN sooner, which would help in directing the traffic to server which is functional.
    • If the connections are stalled for a long period, surge-queue might build. Consider flushing the surge-queue to avoid a sudden spike in load on the server.
    • If the servers are working at their maximum level, consider adding a new server for better performance.
  • Issue

    A majority of the connections are directed to a particular server, even when the least connections method for load balancing is configured.

    Resolution

    Determine whether persistence is configured and is of type source IP. If source IP persistence is configured even with the least connections method, the requests go to a specific server. The server's IP address is required for maintaining the session information. Consider using HTTP Cookies based persistence.

  • Troubleshooting Tips
    For other issues, consider following tips to troubleshoot an issue not listed above:
    • If multiple load monitors are bound to a service, the load on the service is the sum of all the values on the load monitors bound to it. For load balancing to work properly, you must bind the same set of monitors to all the services.
    • If you disable a load monitor bound to the service and the service is bound to a virtual server, the virtual server uses the round robin method for load balancing.
    • When you bind a service to a virtual server where the load balancing method is CUSTOMLOAD and the service status is UP, the virtual server uses the initial round robin method for load balancing. It continues to be in round robin if the service has no custom load monitors, or if status of at least one of the custom load monitors is not UP.
    • All the services that are bound to a virtual server where the load balancing method is CUSTOMLOAD, the services must have load monitors bound to them.
    • The CUSTOMLOAD load balancing method also follows startup round robin.
    • If you disable a metric-based binding and this is the last active metric, the specific virtual server uses the round robin method for load balancing. A metric is disabled by setting the metric threshold to zero.
    • When a metric bound to a monitor crosses the threshold value, that particular service is not considered for load balancing. If all the services have reached the threshold, the virtual server uses the round robin method for load balancing and an error message “5xx - server busy error” is displayed.
    • A maximum of 10 metrics from a custom table can be bound to the monitor.
    • The OIDs must be scalar variables.
    • For successful load balancing, the interval must be as low as possible. If the interval is high, the time period for retrieving the load value increases. As a result, load balancing takes place using improper values.
    • A user cannot modify the local table.