Clustered pools

Clustering provides extra features that are required for resource pools that use GFS2 SRs. For more information about GFS2, see Configure storage.

A cluster is a pool of XenServer hosts that are more closely connected and coordinated than non-clustered pools. The hosts in the cluster maintain constant communication with each other on a selected network. All hosts in the cluster are aware of the state of every host in the cluster. This host coordination enables the cluster to control access to the contents of the GFS2 SR.

Quorum

Each host in a cluster must always be in communication with at least half of hosts in the cluster (including itself). This state is known as a host having quorum.

The quorum value for an odd-numbered pool is half of one plus the total number of hosts in the cluster: (n+1)/2. The quorum value for an even-numbered pool is half the total number of hosts in the cluster: n/2.

For an even-numbered pool, it is possible for the running cluster to split exactly in half. The running cluster decides which half of the cluster self-fences and which half of the cluster has quorum. When an even-numbered clustered pool powers up from a cold start, (n/2)+1 hosts must be available before the hosts have quorum. After the hosts have quorum, the cluster becomes active.

If a host does not have quorum, that host self-fences.

Self-fencing

If a host detects that it does not have quorum, it self-fences within a few seconds. When a host self-fences, it restarts immediately. All VMs running on the host are killed because the host does a hard shutdown. In a clustered pool that uses high availability, XenServer restarts the VMs according to their restart configuration on other pool members. The host that self-fenced restarts and attempts to rejoin the cluster.

If the number of live hosts in the cluster becomes less than the quorum value, all the remaining hosts lose quorum.

In an ideal scenario, your clustered pool always has more live hosts than are required for quorum and XenServer never fences. To make this scenario more likely, consider the following recommendations when setting up your clustered pool:

  • Ensure that you have good hardware redundancy.

  • Use a dedicated bonded network for the cluster network. Ensure that the bonded NICs are on the same L2 segment. For more information, see Networking.

  • Configure storage multipathing between the pool and the GFS2 SR. For more information, see Storage multipathing.

  • Configure high availability on the clustered pool. In clustered pools, the heartbeat SR must be a GFS2 SR. For more information, see High availability.

Create a clustered pool

Before you begin, ensure the following prerequisites are met:

  • All XenServer hosts in the clustered pool must have at least 2 GiB of control domain memory.
  • All hosts in the cluster must use static IP addresses for the cluster network.
  • Citrix recommends that you use clustering only in pools containing at least three hosts, as pools of two hosts are sensitive to self-fencing the entire pool.
  • If you have a firewall between the hosts in your pool, ensure that hosts can communicate on the cluster network using the following ports:
    • TCP: 8892, 21064
    • UDP: 5404, 5405

    For more information, see Communication Ports Used by Citrix Technologies.

  • If you are clustering an existing pool, ensure that high availability is disabled. You can enable high availability again after clustering is enabled.

You can set up clustering on your pool by using XenCenter. For more information, see the XenCenter Help.

To use the xe CLI to create a clustered pool:

  1. Create a resource pool.

    For more information, see Hosts and resource pools.

  2. Create a bonded network or choose an existing bonded network to use as the clustering network.

    For more information, see Networking.

  3. Open a console on a XenServer host in your pool.

  4. For every PIF that belongs to this network, set disallow-unplug=true:

    xe pif-param-set disallow-unplug=true uuid=<pif_uuid>
    
  5. Enable clustering on your pool:

    xe cluster-pool-create network-uuid=<network_uuid>
    

Manage your clustered pool

When managing your clustered pool, the following practices can decrease the risk of the pool losing quorum.

Ensure that hosts are shut down cleanly

When a host is cleanly shut down, it is temporarily removed from the cluster until it is started again. While the host is shut down, it does not count toward the quorum value of the cluster. The host absence does not cause other hosts to lose quorum.

However, if a host is forcibly or unexpectedly shut down, it is not removed from the cluster before it goes offline. This host does count toward the quorum value of the cluster. Its shutdown can cause other hosts to lose quorum.

Use maintenance mode

Before doing something on a host that might cause that host to lose quorum, put the host into maintenance mode. When a host is in maintenance mode, running VMs are migrated off it to another host in the pool. Also, if that host was the pool master, that role is passed to a different host in the pool. If your actions cause a host in maintenance mode to self-fence, you don’t lose any VMs or lose your XenCenter connection to the pool.

Hosts in maintenance mode still count towards the quorum value for the cluster.

You can only change the IP address of a host that is part of a clustered pool when that host is in maintenance mode. Changing the IP address of a host causes the host to leave the cluster. When the IP address has been successfully changed, the host rejoins the cluster. After the host rejoins the cluster, you can take it out of maintenance mode.

Recover hosts that have self-fenced or are offline

It is important to recover hosts that have self-fenced. While these cluster members are offline, they count towards the quorum number for the cluster and decrease the number of cluster members that are contactable. This situation increases the risk of a subsequent host failure causing the cluster to lose quorum and shut down completely.

Having offline hosts in your cluster also prevents you from performing certain actions. In a clustered pool, every member of the pool must agree to every change of pool membership before the change can be successful. If a cluster member is not contactable, XenServer prevents operations that change cluster membership (such as host add or host remove).

Mark hosts as dead

If one or more offline hosts cannot be recovered, you can mark them as dead to the cluster. Marking hosts as dead removes them permanently from the cluster. After hosts are marked as dead, they no longer count towards the quorum value.

Constraints

  • Clustered pools only support up to 16 hosts per pool.
  • If a network has been used for both management and clustering, you cannot separate the management network without recreating the cluster.
  • Using DHCP is highly discouraged, as a dynamic IP address change on a live cluster can expose the cluster to a rare chance of data loss.
  • Changing the IP address of the cluster network by using XenCenter requires clustering and GFS2 to be temporarily disabled.
  • Do not change the bonding of your clustering network while the cluster is live and has running VMs. This action can cause the cluster to fence.

Clustered pools