Disaster recovery and backup
The Citrix Hypervisor Disaster Recovery (DR) feature allows you to recover virtual machines (VMs) and vApps from a failure of hardware which destroys a whole pool or site. For protection against single server failures, see High availability.
You must be logged on with your root account or have the role of Pool Operator or higher to use the DR feature.
Understanding Citrix Hypervisor DR
Citrix Hypervisor DR works by storing all the information required to recover your business-critical VMs and vApps on storage repositories (SRs). The SRs are then replicated from your primary (production) environment to a backup environment. When a protected pool at your primary site goes down, you can recover the VMs and vApps in that pool from the replicated storage recreated on a secondary (DR) site with minimal application or user downtime.
The Disaster Recovery settings in XenCenter can be used to query the storage and import selected VMs and vApps to a recovery pool during a disaster. When the VMs are running in the recovery pool, the recovery pool metadata is also replicated. The replication of the pool metadata allows any changes in VM settings to be populated back to the primary pool when the primary pool recovers. Sometimes, information for the same VM can be in several places. For example, storage from the primary site, storage from the disaster recovery site and also in the pool that the data is to be imported to. If XenCenter finds that the VM information is present in two or more places, it ensures that it uses only the most recent information.
The Disaster Recovery feature can be used with XenCenter and the xe CLI. For CLI commands, see Disaster recovery commands.
You can also use the Disaster Recovery settings to run test failovers for non-disruptive testing of your disaster recovery system. In a test failover, all the steps are the same as failover. However, the VMs and vApps are not started up after they have been recovered to the disaster recovery site. When the test is complete, cleanup is performed to delete all VMs, vApps, and storage recreated on the DR site.
Citrix Hypervisor VMs consist of two components:
Virtual disks that are being used by the VM, stored on configured storage repositories (SRs) in the pool where the VMs are located.
Metadata describing the VM environment. This information is required to recreate the VM if the original VM is unavailable or corrupted. Most metadata configuration data is written when the VM is created and is updated only when you change the VM configuration. For VMs in a pool, a copy of this metadata is stored on every server in the pool.
In a DR environment, VMs are recreated on a secondary site using the pool metadata and configuration information about all VMs and vApps in the pool. The metadata for each VM includes its name, description and Universal Unique Identifier (UUID), and its memory, virtual CPU, and networking and storage configuration. It also includes VM startup options – start order, delay interval, high availability, and restart priority. The VM startup options are used when restarting the VM in a high availability or DR environment. For example, when recovering VMs during disaster recovery, VMs within a vApp are restarted in the DR pool in the order specified in the VM metadata, and using the specified delay intervals.
DR infrastructure requirements
Set up the appropriate DR infrastructure at both the primary and secondary sites to use Citrix Hypervisor DR.
Storage used for pool metadata and the virtual disks used by the VMs must be replicated from the primary (production) environment to a backup environment. Storage replication such as using mirroring varies between devices. Therefore, consult your storage solution vendor to handle Storage replication.
After the VMs and vApps that you recovered to a pool on your DR site are up and running, the SRs containing the DR pool metadata and virtual disks must be replicated. Replication allows the recovered VMs and vApps to be restored back to the primary site (failed back) when the primary site is back online.
The hardware infrastructure at your DR site does not have to match the primary site. However, the Citrix Hypervisor environment must be at the same release and patch level. In addition, sufficient resources must be configured in the target pool to allow all the failed over VMs to be recreated and started.
The Disaster Recovery settings do not control any Storage Array functionality.
Users of the Disaster Recovery feature must ensure that the metadata storage is, in some way replicated between the two sites. Some Storage Arrays contain “Mirroring” features to achieve the replication automatically. If you use these features, you must disable the mirror functionality (“mirror is broken”) before restarting VMs on the recovery site.
Review the following steps before enabling Disaster Recovery.
Steps to take before a disaster
The following section describes the steps to take before disaster.
Configure your VMs and vApps.
Note how your VMs and vApps are mapped to SRs, and the SRs to LUNs. Take particular care with the naming of the
name_descriptionparameters. Recovering VMs and vApps from replicated storage is easier if the names of SRs capture how VMs and vApps are mapped to SRs, and SRs to LUNs.
Arrange replication of the LUNs.
Enable pool metadata replication to one or more SRs on these LUNs.
Ensure that the SRs you are replicating the primary pool metadata to are attached to only one pool.
Steps to take after a disaster
The following section describes the steps to take after a disaster has occurred.
Break any existing storage mirrors so that the recovery site has read/write access to the shared storage.
Ensure that the LUNs you want to recover VM data from are not attached to any other pool, or corruption can occur.
If you want to protect the recovery site from a disaster, you must enable pool metadata replication to one or more SRs on the recovery site.
Steps to take after a recovery
The following section describes the steps to take after a successful recovery of data.
Resynchronize any storage mirrors.
On the recovery site, cleanly shut down the VMs or vApps that you want to move back to the primary site.
On the primary site, follow the same procedure as for the failover in the previous section, to failback selected VMs or vApps to the primary
To protect the primary site against future disaster - you must re-enable pool metadata replication to one or more SRs on the replicated LUNs.