Product Documentation

Load-Balancing in the WCCP Cluster

Aug 31, 2017

Traffic is distributed among the appliances in the WCCP cluster. If an appliance leaves the cluster (through failure, overload, or being manually disabled), its traffic is rebalanced by distributing it among the surviving members. If an appliance joins the cluster, traffic is rebalanced once more to give the new appliance its fair share.

The Address Mask

Traffic is distributed on the basis of an address mask that is applied to the source and destination addresses of WAN traffic. You must select an appropriate mask field for efficient load-balancing. An inappropriate mask can result in load-balancing that is poor to nonexistent. For example, if the mask matches an address field that is identical at all your remote sites, all your WAN traffic is sent to a single appliance, overloading it. For example, if all of your remote sites have an address in the form of 10.0.x.x, and your mask bits are within the 10.0 portion of the address all traffic is sent to a single appliance.

The address bits extracted by the address mask are used as an index that is used (indirectly) to select one of the WCCP caches (appliances). For example, an address mask with two ”one” bits results in four possible values, depending on the address. Each of these values can be thought of as a bucket. With two mask bits, you have four buckets, numbered 0-3. The buckets are assigned to WCCP caches. To be effective, there must be at least as many buckets as caches. If you use a two-bit mask and have five or more caches, some caches are idle, because each bucket is assigned to only one cache, and there are not enough buckets to cover all five caches:
Cache 1 2 3 4 5
Buckets 0 1 2 3 -
If there are more buckets than caches, some caches are assigned multiple buckets. For example, if you set three mask bits, creating eight buckets, and you have four caches, two buckets are assigned to each cache. If you have five caches, three caches are assigned two buckets each, and two caches are assigned just one. If each bucket represents the same number of users, you have a 2:1 load imbalance across caches:
Cache 1 2 3 4 5
Buckets 0-1 2-3 4-5 6 7
Increasing the number of set mask bits reduces this imbalance. With four mask bits (16 index values) and five caches, four caches receive three buckets and one cache receives four buckets, resulting in only a 4:3 imbalance. With six set mask bits (the largest number supported), four caches receive 13 buckets and one receives 12, which is only a 13:12 load imbalance.
Cache 1 2 3 4 5
Buckets 0-12 13-25 26-38 39-51 52-63

Ideally, you would like each remote site to be directed to a single appliance in the WCCP cluster, so that all traffic to and from a given site is stored in the same compression history. With this arrangement, any traffic from one user at the site can be used to compress similar traffic from any other user at that site. In other words, for compressibility, load-balancing works best if it the address mask selects the bits that differentiate one remote site from another. These are often the least-significant bits of the subnet portion of the IP address. Using these bits tends to allocate the same number of remote sites (not users) per local appliance. A mask that aligns with the host portion of the address instead of the subnet results in a more equal number of remote users (not sites) per appliance, but at the expense of compression effectiveness. (Compression is only effective when connections flow through the same appliances, and splitting traffic from the same remote site between two or more local appliances interferes with this.)

Finally, for good load-balancing, each "one" bit in the address mask must be set to one on 50% of the remote addresses, and set to zero on 50% of the remote addresses. This is not the case on all address bits, since in most WANs, the highest-order network bits never change at all (such as the 10 in 10.x.x.x). Such bits must never be selected by the address mask.

In addition, many subnets are only sparsely populated. For example, if only 50 addresses are used in the subnet 10.1.2.0/24, and they are assigned sequentially, the two higher-order host bits (representing the unused range of 10.1.2.64-10.1.2.255) for this subnet never change, and if these two bits are included in the address mask, three-fourths of the buckets receive no traffic.

Useful compromises between these two extremes can generally be found.

Follow these rules:
  • The number of ”one” bits in the address mask must allow at least as many combinations as there are WCCP caches in the cluster. That is, if you have eight appliances, the address mask must contain at least three ”one” bits.
  • The ”one” bits in the address mask must each be inside the active address range for most of your remote subnets, or they skew the load-balancing distribution.
  • The mask should split the address range of individual remote sites into as few pieces as possible, for best compression performance.
  • If a remote appliance is faster than the local members of the WCCP cluster, the mask should be designed to divide its traffic between multiple local appliances. For example, a 100 Mbps remote appliance should have its traffic split between two 50 Mbps local appliances by setting a bit inside the remote appliance’s active address range.
  • The “one” bits in the mask are typically contiguous, but this is not required. They can be in any pattern.

Example: Suppose you set an address mask of 0x0000 0f00, which has four “one” bits. This defines a four-bit field that is extracted from the IP address, yielding 16 possible results (16 buckets). These buckets are in turn assigned to the actual WCCP caches in the WCCP cluster.

Address Masked Address (mask = 0x0000 0f00) Bucket
10.0.0.5 0.0.0.0 0
10.0.1.128 0.0.1.0 1
155.0.2.55 0.0.2.0 2
253.100.255.2 0.0.15.0 15
10.0.15.1 0.0.15.0 15

Zero bits in the mask are ignored, and the “one” bits are used to define the extracted field. So if the mask is 0x10 10 10 10, these widely separated “one” bits are extracted into a four-bit field, declaring 16 buckets and a bucket numbers in the range of 0-15.

If the mask value is set to zero, a default value of 0x00 00 0f 00 is used.