Design methodology hardware layer
This section covers hardware sizing for the virtual infrastructure servers, virtual desktops, and virtual application hosts. The sizing of these servers is typically done in two ways.
- The first and preferred way is to plan ahead and purchase hardware based on the workload requirements.
- The second way is to use existing hardware in the best configuration to support the different workload requirements.
This section will discuss decisions related to both methods.
Decision: Workload Separation
When implementing a XenApp and XenDesktop deployment, the workloads for the infrastructure, XenDesktop, and XenApp workloads can be separated into dedicated resource clusters or mixed on the same physical hosts. Citrix recommends using resource clusters to separate the workloads, especially in an enterprise deployment. This allows better host sizing as each workload has unique requirements such as overcommit ratios and memory usage.
In smaller environments where resource clusters are cost prohibitive, the workloads may be mixed in a manner which still allows for a highly available environment. Citrix leading practice is to separate the workloads however mixed workloads is a cost based business decision.
Decision: Physical Processor (pCPU)
The following table provides guidance on the number of virtual desktops that can be supported for light, medium and heavy workloads per physical core. Each desktop correlates to a single concurrent user, with the assumption that the operating system underwent optimization.
|Operating System||Users per Physical Core|
|Operating System||Users per Physical Core|
|Operating System||Users per Physical Core|
The estimate for “Users per Physical Core” is a baseline number running Microsoft Office 2010. The baseline number must be adjusted based on specific infrastructure requirements. As a general guideline, the following characteristics are baseline changes to server density.
|Characteristic||Server Density Impact|
|Antivirus (not optimized)||25% decrease|
|Real-time Monitoring||15% decrease|
|Office 2013||20% decrease|
|Office 2016||25% decrease|
To estimate the total number of physical cores required for the XenApp and XenDesktop workload, use the following formula for each user group:
∑ represents the sum of all user group combinations “i”.
Usersi = Number of concurrent users per user groups
UsersPerCorei = Number of users per physical core
AV = Antivirus impact (default = 0.25)
Mon = Monitoring tools impact (default = 0.15)
Off13 = Office 2013 impact (default = .2)
Off16 = Office 2016 impact (default = .25)
HT = Hyper-Threading impact (default = .2)
If workloads will be separated (XenApp and XenDesktop workloads), the formula should be calculated twice, once for all XenDesktop users and the second for all XenApp users in order
Decision: Physical Memory (pRAM)
The recommended method for sizing memory to a physical host is to size based on the total memory required to support the virtual machines and the CPU capacity of the host. In order to calculate the total memory required for XenApp and XenDesktop, simply multiply the number of virtual machines by the amount of memory allocated to the virtual machines. The sum of all of the machine catalogs will be the total RAM required for XenApp and XenDesktop hosts. This is shown in the formula below.
∑ represents the sum of all user group combinations “i”.
VMi = Number of concurrent users per user groups
vRAMi = Amount of RAM assigned to each virtual machine
If workloads will be separated onto different hosts (XenApp and XenDesktop workloads), the formula should be calculated twice, once for all XenDesktop users and the second for all XenApp users.
Decision: Physical Host (pHost)
In most situations, the number of physical hosts (pHost) to support the XenApp and XenDesktop workloads will be limited on the number of processor cores available. The following formula provides an estimate for the number of hosts required for the user workloads. The formula is based on the best practice of separating the XenApp and XenDesktop workloads due to the different recommended CPU overcommit ratios for each.
XenDesktop pHosts = (Total XenDesktop pCPU / Cores per pHost +1)
XenApp pHosts = (Total XenApp pCPU / Cores per pHost +1)
Once the number of physical hosts has been determined based on processor cores, the amount of RAM for each host is calculated.
XenDesktop pRAM per pHost = HypervisorRAM + (Total XenDesktop pRAM / XenDesktop pHosts -1)
XenAPP pRAM per pHost = HypervisorRAM + (Total XenApp pRAM / XenApp pHosts -1)
Hosts used to deliver graphical workloads require graphics processors to deliver a high end user experience. Specific hardware hosts and graphics cards are required to support high end graphics using HDX 3D Pro. An updated list of tested hardware is available in a knowledge base article. Sizing of the desktop and application hosts of high end graphics users should be based on the GPU requirements ensuring that the host then has adequate CPU and memory resource to support the workload.
NVIDIA GRID cards can be leveraged with vGPU profiles to support multiple users. Sizing guidelines are provided from NVIDIA in the table below.
In the table, Y indicates that application certificates are available.
|NVIDIA GRID Graphics Board||Virtual GPU Profile||Application Certifications||Graphics Memory||Max Displays Per User||Max Resolution Per Display||Max vGPU Per Graphics Board||Use Case|
|GRID K2||K280Q||Y||4,096 MB||4||2560 x 1600||2||Designer|
|K260Q||Y||2,048 MB||4||2560 X 1600||4||Designer / Power User|
|K240Q||Y||1,024 MB||2||2560 x 1600||8||Designer / Power User|
|K220Q||Y||512 MB||2||2560 x 1600||16||Knowledge Worker|
|K200||256 MB||2||1900 x 1200||16||Power User|
|GRID K1||K180Q||Y||4,096 MB||4||2560 x 1600||4||Power User|
|K160Q||Y||2,048 MB||4||2560 x 1600||8||Power User|
|K140Q||Y||1,024 MB||2||2560 x 1600||16||Power User|
|K120Q||Y||512 MB||2||2560 x 1600||32||Power User|
|K100||256 MB||2||1900 x 1200||32||Knowledge Worker|
Decision: Storage Architecture
The primary storage architectures are as follows:
- Local Storage - Uses hard disks directly attached to the computer system. The disks cannot be shared with other computer systems, but if the computer is hosting pooled or hosted shared desktops, a shared storage solution is not necessary. In many cases local storage can perform as well as shared storage. Scalability is limited to the number of drive bays available in the computer system. Many blade servers for example have just two drive bays, so using local storage to support a XenDesktop deployment may not be optimal.
- DAS - Storage sub-system directly attached to a server or workstation using a cable. It uses block-level storage and can be a hard disk local to the computer system or a disk shelf with multiple disks attached by means of external cabling. Unlike local disks, disk shelves require separate management. Storage shelves can be connected to multiple servers so the data or disks can be shared.
- NAS - Provides file-level storage to computer systems through network file shares. The NAS operates as a file server, and NAS systems are networked appliances which contain one or more hard drives, often arranged into logical, redundant storage containers or RAID arrays. Access is typically provided using standard Ethernet and network file sharing protocols such as NFS, SMB/CIFS, or AFP.
NAS can become a single point of failure. If the network share becomes unavailable, all target devices streamed from the disk will be unavailable as well.
- SAN - Dedicated storage network that provides access to consolidated, block-level storage. SANs allow computers to connect to different storage devices, so no server has ownership of the storage subsystem enabling data to be shared among multiple computers. A SAN will typically have its own dedicated network of storage devices that are generally not accessible through the network by standard means. In order to connect a device to the SAN network a specialized adapter called the Host Bus Adapter (HBA) is required. SANs are highly scalable with no noticeable change in performance as more storage and devices are connected. SANs can be a costly investment both in terms of capital and the time required to learn, deploy and manage the technology.
- Hybrid - A NAS head refers to a NAS which does not have any on-board storage, but instead connects to a SAN. In effect, it acts as a translator between the file-level NAS protocols (NFS, CIFS, etc.) and the block-level SAN protocols (Fibre Channel and iSCSI). Thus it can combine the advantages of both technologies and allows computers without Host Bus Adapters (HBA) to connect to centralized storage.
The following table summarizes the storage options available and rates their suitability for XenDesktop deployments.
|Performance||High||Med - High||Med - High||High|
|Redundancy||Low - Med||Med - High||Med - High||High|
|Scalability||Low||Med - High||Med - High||High|
|Typical use case||Small to medium production and test environments||Small to medium production environments||Small to medium production environments||Medium to large production environments|
Hyper-V 2008 R2 does not support NAS technology. Hyper-V 2012/2012 R2 only supports NAS solutions that support the SMB 3.0 protocol. For more information please refer to the HyperV 2008 R2 and Hyper-V 2012 R2 sections of the handbook.
Local storage is best suited for storing virtual machines which do not have high availability requirements or persistent data attached such as random (pooled) desktops or hosted shared desktops. Local and DAS is suited for storing user data and home directory files. If using Machine Creation Services, master images as well as any updates must be replicated to each server.
NAS and SAN storage is best suited for infrastructure servers supporting the XenDesktop environment, and virtual machines with persistent data such as static (dedicated) desktops.
Decision: RAID Level
To choose the optimal RAID level, it is necessary to consider the IOPS and read/write ratio generated by a given application or workload in combination with the individual capabilities of a RAID level. For hosting read intensive workloads, such as the Provisioning Services vDisk store, RAID levels that are optimized for read operations such as RAID 1, 5, 6, 10 are optimal. This is because these RAID levels allow read operations to be spread across all disks within the RAID set simultaneously.
For hosting write intensive workloads, such as Provisioning Services write cache and Machine Creation Services differencing disks, RAID levels such as RAID 1 or 10 are optimal, as these are optimized for writes and have a low write penalty.
The following table outlines the key quantitative attributes of the most commonly used RAID levels:
|RAID||Capacity (%)||Fault Tolerance||Read Performance||Write Performance||Minimum number of disks|
|0||100||None||Very High||High (Write Penalty 1)||2|
|1||50||Single-drive failure||Very High||Medium (Write Penalty 2)||2|
|5||67 - 94||Single-drive failure||High||Low (Write Penalty 4)||3|
|6||50 - 88||Dual-drive failure||High||Low (Write Penalty 6)||4|
|10||50||Single-drive failure in each sub array||Very High||Medium (Write Penalty 2)||4|
The write penalty is inherent in RAID data protection techniques, which require multiple disk I/O requests for each application write request, and ranges from minimal (mirrored arrays) to substantial (RAID levels 5 and 6).
Decision: Number of Disks
To determine the number of disks required it is important to understand the performance characteristics of each disk, the characteristics of the RAID level and the performance requirements of the given workload. The basic calculation for determining the total number of disks needed is:
Total # of Disks = Total Read IOPS + (Total Write IOPS x RAID Penalty))/Disk Speed IOPS
For example, a disk manufacturer is reporting that a particular disk array which they have developed has a total workload IOPS of 2000. The raw IOPS per disk is 175. To determine how many disks are required to support a workload with 20% read operations and 80% write operations on RAID 10:
Total # of Disks =((20% x 2000) + (80% x 2000) x 2) / 175 = 20.57 or 21 Disks
Based on the previous example, the following table shows how the disk count will vary based on the RAID level and the read/write ratio.
|RAID||RAW IOPS (per disk)||Workload IOPS||Read %||Write %||Disk Count|
|1 / 10||175||2000||20||80||21|
Decision: Disk Type
Hard disk drives (HDDs) are the traditional variation of disk drives. These kinds of disks consist of rotating platters on a motor-driven spindle within a protective enclosure. The data is magnetically written to and read from the platter by read/write heads.
Different implementations of this technology are available on the market, which differ in terms of performance, cost and reliability.
- Serial ATA (SATA) disk transmit data serially over two pairs of conductors. One pair is for differential transmission of data, and the other pair is for differential receiving of data. SATA drives are widely found in consumer desktop and laptop computers. Typical SATA drives have transfer speeds ranging from 1500 – 6000Mbps and support hot-swapping by design.
- Small Computer Systems Interface (SCSI) disks use a buffered, peer to peer interface that uses handshake signals between devices. Many SCSI devices require a SCSI initiator to initiate SCSI transactions between the host and SCSI target. SCSI disks are common in workstations and servers and have throughputs ranging from 40 – 5120Mbps. iSCSI (Internet Small Computer System Interface) is a mapping of the regular SCSI protocol over TCP/IP, more commonly over Gigabit Ethernet.
- Fibre Channel (FC) disk is the successor to the parallel SCSI disk and is common in SAN storage devices. Fibre Channel signals can run on an electrical interface or fibre-optic cables. Throughput can range from 1 – 20Gbps, and connections are hot-pluggable.
- Serial Attached SCSI (SAS) disk uses a new generation serial communication protocol to allow for higher speed data transfers than SATA disks. Throughput can range from 2400 – 9600Mbps.
In contrast to traditional hard disks, Solid State Disks (SSDs) use microchips to retain data in either NAND non-volatile memory chips (flash) or DRAM and contain no moving parts. SSDs are less susceptible to physical shock, have lower access times and latency and have higher I/O rates. SSDs have significantly higher random read performance. An SSD drive can attain anywhere from 5,000 to 20,000 random reads per second. SSDs are also more expensive per gigabyte (GB) and typically support a limited number of writes over the life of the disk.
Flash memory-based SSDs can be either based on multi-level cells (MLC) or single-level cells (SLC). SLC devices only store one bit of information in each cell. MLC devices can store multiple bits of information with each cell. Flash based SSDs cost lower than DRAM based SSDs but perform slower. DRAM based SSD devices are used primarily to accelerate applications that would otherwise be held back by the latency of flash SSDs or traditional HDDs.
SSDs were previously not viable for enterprise storage solutions because of the high cost, low capacity and fast wear of the drives. Improvements in SSD technology and lowering costs are making them more favorable over HDDs. Solid state hybrid drives (SSHD) combine the features of SSDs and HDDs, by containing a large HDD drive with an SSD cache to improve performance of frequently accessed data.
Comparing SSDs and HDDs is difficult since HDD benchmarks are focused on finding the performance aspects such as rotational latency time and seek time. As SSDs do not spin, or seek, they may show huge superiority in such tests. However, SSDs have challenges with mixed reads and writes and their performance may degrade over time.
The following table compares the transfer rates of some of the more common storage types available on the market today.
|iSCI over Fast Ethernet||100|
|Ultra-2 wide SCSI (16 bits/40 MHz)||640|
|iSCI over Gigabit Ethernet||1,000|
|SATA rev 3||6,000|
|FCoE over 10 GbE||10,000|
|SATA rev 3.2 - SATA Express||16,000|
|iSCI over Infiniband||32,000|
SCSI and SATA disks are best suited for storing data that does not have high performance requirements like the PVS vDisk store. SAS, Fibre Channel, or SSD drives are best suited for storing data that have high performance requirements like the PVS write cache.
Decision: Storage Bandwidth
Storage bandwidth is the connectivity between servers and the storage subsystem. Understanding bandwidth requirements can help determine the proper hardware for delivering data and applications at speeds for a positive end user experience. For most datacenters 10Gbps Ethernet or 10Gbps FCoE is sufficient for storage connections. Smaller environments however may only need 1Gbps bandwidth. In virtualized environments it is not just important to look at the bandwidth requirements of the physical host and storage subsystem, but determining how much bandwidth is required for each virtual machine plays a factor too.
In order to plan for the required bandwidth, it is necessary to determine the throughputs for every individual system that uses a shared component or network path. For example, the following information is provided for an environment with 100 similar virtual machines (hosted on 10 virtualization hosts and connected to one NAS head).
|Throughput per VM||10 MBps||30 MBps|
|Throughput per host||100 MBps (10 VMs x 10 MBps)||300 MBps (10VMs x 30 MBps)|
|Throughput per storage||1 GBps (10 hosts x 100 MBps)||3 GBps (10 hosts x 300 MBps)|
The NIC used for storage communication needs to be a 1Gbps adapter in order to handle the peak load. The NAS head as well as its network connection need to support 3Gbps worth of data traffic in order to support the peak load of all systems.
Decision: Tiered Storage
A one-size-fits-all storage solution is unlikely to meet the requirements of most virtual desktop implementations. The use of storage tiers provides an effective mechanism for offering a range of different storage options differentiated by performance, scalability, redundancy and cost. In this way, different virtual workloads with similar storage requirements can be grouped together and a similar cost model applied.
For example, a XenDesktop implementation using tiered storage may look like the following:
- Tier 1 storage group - Write intensive files such as the write cache and differencing disks are placed in a storage group consisting of SSDs.
- Tier 2 storage group - Mission critical data, or data that requires high availability, are placed in a storage group consisting of less expensive high performing drives.
- Tier 3 storage group - Seldom used data files, read-only files, or other non-mission critical data placed in a storage group consisting of low cost and lower performing drives.
Decision: Thin Provisioning
Thin provisioning allows more storage space to be presented to the virtual machines than is actually available on the storage repository. This lowers storage costs by allowing virtual machines access to disk space that is often unused. This is particularly beneficial to Machine Creation Services which uses a linked-clone approach to provisioning virtual machines. Thin provisioning minimizes the storage space required for the master image copies used to build virtual machines. Thin provisioning is possible at the physical storage layer, a feature usually available with most SAN solutions, and at the virtual layer. NFS based storage solutions will usually have thin provisioning enabled by default.
At the physical storage layer, it is important to ensure that sufficient storage is available to prevent the risk of virtual machines not being available in a storage “overcommit” scenario when available disk space is exhausted. Organizations should decide if the cost savings thin provisioning provides outweighs the associated risk and consider enabling if the storage solution supports it.
Virtual machines may not function if disk space is exhausted so it is important to have a process in place, either through alerts or notifications that will give administrators enough time to add more disks to the storage solution so that the XenDesktop environment is not impacted.
Decision: Data De-Duplication
Data de-duplication is a data compression technique whereby duplicate data is replaced with pointers to a single copy of the original item. This reduces storage requirements and costs by improving storage utilization, however it can impact storage performance.
There are two implementations of de-duplication available:
- Post-process de-duplication – The de-duplication is performed after the data has been written to disk. Post-process de-duplication should be scheduled outside business hours to ensure that it does not impact system performance. Post Process de-duplication offers minimal advantages for random desktops as the write-cache/difference disk is typically reset on a daily basis.
- In-line de-duplication – Examines data before it is written to disk so that duplicate blocks are not stored. The additional checks performed before the data is written to disk can sometimes cause slow performance. If enabled, in-line duplication should be carefully monitored to ensure that it is not affecting the performance of the XenDesktop environment.
If the storage solution supports it, enabling post-process data de-duplication is recommended for minimal impact to XenDesktop performance.