Product Documentation

Design methodology control layer

Apr 26, 2018

The control layer is the fourth layer of the design methodology. 

Active Directory

Decision: Forest Design

Multi-forest deployments, by default, do not have inter-domain trust relationships between the forests. An AD administrator can establish trust relationships between the multiple forests, allowing the users and computers from one forest to authenticate and access resources in another forest.

For forests that have inter-domain trusts, it is recommended that the appropriate settings be configured to allow the Delivery Controllers to communicate with both domains. When the appropriate trusts are not configured, multiple XenDesktop sites for each forest must be configured. This section outlines the storage requirements on a per product basis and provides sizing calculations. For more information, please refer to Citrix article: CTX134971 – Successfully Deploying XenDesktop in a Complex Active Directory Environment  

Decision: Organizational Unit Structure

Infrastructure components for a XenApp and XenDesktop deployment should reside within their own dedicated organizational units (OUs); separating workers and controllers for management purposes. By having their own OUs, the objects inside will have greater flexibility with their management while allowing Citrix administrators to be granted delegated control.

A sample Citrix OU structure can be seen below.  

localized image

Decision: User Groups

Whenever possible, permissions and authorization should be assigned to user groups rather than individual users, thereby eliminating the need to edit a large amount of resource permissions and user rights when creating, modifying, or deleting user accounts. Permission application example:

  • An application published to one group of 1,000 users requires the validation of only one object for all 1,000 users. 
  • The same application published to 1,000 individual user accounts requires the validation of all 1,000 objects.  

Database

The majority of Citrix products discussed within this document require a database. The following table outlines the usage on a per product basis:

In this table:

  • Y indicates Available.
  • O indicates Optional. 
Product Configuration Data Runtime Data Audit/Change Log Data Monitoring Data

XenDesktop

Y

Y

Y

Y

Provisioning Services

Y

O

Desktop Player

Y

Y

Y

Decision: Edition

There are multiple editions of Microsoft SQL Server 2012: Express, Web, Standard, Business Intelligence, and Enterprise. Based on the capabilities of the various SQL Server editions available, the Standard edition is often used for hosting the XenApp and XenDesktop databases in production environments. 

The Standard edition provides an adequate amount of features to meet the needs of most environments. For more information on the databases supported with Citrix products please refer to the Citrix Database Support Matrix. Different versions of Citrix products support different versions of the SQL server; therefore, it is important to check the support matrix to ensure the version of SQL server used is compatible with the Citrix product being deployed.

Decision: Database Server Sizing

The SQL Server must be sized correctly to ensure the performance and stability of an environment. Since every Citrix product uses SQL server in a different way, no generic all-encompassing sizing recommendations can be provided. Instead, per-product SQL server sizing recommendations are provided below.

XenApp and XenDesktop

XenApp and XenDesktop Brokers use the database as a message bus for broker communications, storing configuration data and storing monitoring and configuration log data. The databases are constantly in use and the performance impact on the SQL server can be considered as high.

Based on results from Citrix internal scalability testing the following SQL server specification for a server hosting all XenDesktop databases are recommended: 

  • 2 Cores / 4 GB RAM for environments up to 5,000 users 
  • 4 Cores / 8 GB RAM for environments up to 15,000 users 
  • 8 Cores / 16 GB RAM for environments with 15,000+ users

The database files and transaction logs should be hosted on separate hard disk subsystems in order to cope with a high number of transactions. For example, registering 20,000 virtual desktops during a 15-minute boot storm causes ~500 transactions / second and 20,000 users logging on during a 30-minute logon storm causes ~800 transactions / second on the XenDesktop Site Database.

Provisioning Services

In addition to static configuration data provisioning servers store runtime and auditing information in the database. Depending on the boot and management pattern, the performance impact of the database can be considered as low to medium.

Based on this categorization, a SQL server specification of 4 Cores and 4 GB RAM is recommended as a good starting point. The SQL server should be carefully monitored during the testing and pilot phase in order to determine the optimal configuration of the SQL server.  

Decision: Instance Sizing

When sizing a SQL database, two aspects are important: 

  • Database file – Contains the data and objects such as tables, indexes, stored procedures and views stored in the database. 
  • Transaction log file – Contains a record of all transactions and database modifications made by each transaction. The transaction log is a critical component of the database and, if there is a system failure, the transaction log might be required to bring the database back to a consistent state. The usage of the transaction log varies depending on which database recovery model is used:
    • Simple recovery – No log backups required. Log space is automatically reclaimed, to keep space requirements small, essentially eliminating the need to manage the transaction log space. Changes to the database since the most recent backup are unprotected. In the event of a disaster, those changes must be redone.
    • Full recovery – Requires log backups. No work is lost due to a lost or damaged database data file. Data of any arbitrary point in time can be recovered (for example, prior to application or user error). Full recovery is required for database mirroring.
    • Bulk-logged – Requires log backups. This is an adjunct of the full recovery model that permits high-performance bulk copy operations. It is typically not used for Citrix databases.

For further information, please refer to the Microsoft Developer Network – SQL Server Recovery Models.

In order to estimate storage requirements, it is important to understand the disk space consumption for common database entries. This section outlines the storage requirements on a per product basis and provides sizing calculations. For more information, please refer to Citrix article: CTX139508 – XenDesktop 7.x Database Sizing.

XenDesktop General

XenApp 7.x and XenDesktop 7.x use three distinct databases: 

  • Site Configuration database – Contains static configuration and dynamic runtime data
  • Monitoring database – Contains monitoring data which is accessible via Director
  • Configuration logging database – Contains a record for each administrative change performed within the site (accessible via Studio)

Site Database

Since the database of a XenApp or XenDesktop site contains static configuration data and dynamic runtime data, the size of the database file depends not only on the physical size of the environment but also user patterns. The following factors all impact the size of the database file: 

  • The number of connected sessions 
  • The number of configured and registered VDAs 
  • The number of transactions occurring during logon 
  • VDA heartbeat transactions

The size of the Site Database is based on the number of VDAs and active sessions.  The following table shows the typical maximum database size Citrix observed when scale testing XenApp and XenDesktop with a sample number of users, applications, and desktop delivery methods.

Users Applications Desktop Types Expected Maximum Size (MB)

1,000

50

Hosted Shared

30

10,000

100

Hosted Shared

60

100,000

200

Hosted Shared

330

1,000

N/A

Hosted Pooled

30

10,000

N/A

Hosted Pooled

115

40,000

N/A

Hosted Pooled

390

Remarque

This sizing information is a guide only. Actual database sizes may differ slightly by deployment due to how databases are maintained.

Determining the size of the transaction log for the Site database is difficult due to factors that can influence the log including: 

  • The SQL Database recovery model 
  • Launch rate at peak times 
  • The number of desktops being delivered

During XenDesktop scalability testing, Citrix observed the transaction log growth rate at 3.5MB an hour when the system is idle, and a per user per day growth rate of ~32KB. In a large environment, transaction log usage requires careful management and a regular backup, to prevent excessive growth. This can be achieved by means of scheduled jobs or maintenance plans.

Monitoring Database

Of the three databases, the Monitoring database is expected to be the largest since it contains historical information for the site. Its size is dependent on many factors including: 

  • Number of Users
  • Number of sessions and connections 
  • Number of workers 
  • Retention period configuration – Platinum customers can keep data for over a year (default 90 days). Non-platinum customers can keep data for up to 7 days (default 7 days). 
  • Number of transaction per second. Monitoring service tends to execute updates in batches. It is rare to have the number of transactions per second go above 20. 
  • Background transaction caused by regular consolidation calls from the Monitoring service. 
  • Overnight processing carried out to remove data outside the configured retention period.

The following table shows the estimated size of the Monitoring database over a period of time under different scenarios. This data is an estimate based on data seen within scale testing XenApp and XenDesktop (assuming a 5 day working week).

Users

Type

1 week (MB)

1 month (MB)

3 months (MB)

1 year (MB)

1,000

HSD

20

70

230

900

10,000

HSD

160

600

1,950

7,700

100,000

HSD

1,500

5,900

19,000

76,000

1,000

VDI

15

55

170

670

10,000

VDI

120

440

1,400

5,500

40,000

VDI

464

1,700

5,400

21,500

Estimates with 2 connections and 1 session per user with a 5 day work week

Users

Type

1 week (MB)

1 month (MB)

3 months (MB)

1 year (MB)

1,000

HSD

30

100

330

1,300

10,000

HSD

240

925

3,000

12,000

100,000

HSD

2,400

9,200

30,000

119,000

1,000

VDI

25

85

280

1,100

10,000

VDI

200

750

2,500

9,800

40,000

VDI

800

3,000

9,700

38,600

Remarque

The 100,000 HSD tests are based on a test environment consisting of: 

  • 2 Delivery Controllers 
  • 43 Hosted Shared Desktop workers 
  • 3 SQL servers, configured with databases held within one Always On Availability Group.

For more information please see the Citrix Support article –  XenDesktop 7.x Database Sizing.

The size of the transaction log for the Monitoring Database is very hard to estimate, but XenApp and XenDesktop scalability testing showed a growth rate of about 30.5 MB an hour when the system is idle, and a per user per day growth rate of ~9 KB. 

Configuration Logging Database

The Configuration Logging Database is typically the smallest of the three databases. Its size and the size of the related transaction log depends on the daily administrative activities initiated from Studio, Director or PowerShell scripts, therefore its size is difficult to estimate. The more configuration changes are performed, the larger the database will grow. Some factors that can affect the size of the database include: 

  • The number of actions performed in Studio, Director and PowerShell. 
  • Minimal transactions which occur on the database when no configuration changes are taking place. 
  • The transaction rate during updates. Updates are batched whenever possible. 
  • Data manually removed from the database. Data within the Configuration Logging Database is not subject to any retention policy, therefore it is not removed unless done so manually by an administrator. 
  • Activities that have an impact on sessions or users, for example, session logoff and reset. 
  • The mechanism used for deploying desktops.

In XenApp environments not using MCS, the database size tends to fall between 30 and 40MB. For MCS environments, database size can easily exceed 200MB due to the logging of all VM build data.

Temporary Database 

In addition to the Site, Monitoring, and Configuration Logging databases, a system-wide temporary database (tempdb) is provided by SQL Server. This temporary database is used to store Read-Committed Snapshot Isolation data. XenApp 7.x and XenDesktop 7.x uses this SQL Server feature to reduce lock contention on the XenApp and XenDesktop databases. Citrix recommends that all XenApp 7.x and XenDesktop 7.x databases use Read-Committed Snapshot Isolation. For more information please see How to Enable Read-Committed Snapshot in XenDesktop

The size of the tempdb database will depend on the number of active transactions, but in general it is not expected to grow more than a few MBs. The performance of the tempdb database does not impact the performance of XenApp and XenDesktop brokering, as any transactions that generate new data require tempdb space. XenApp and XenDesktop tend to have short-lived transactions, which help keep the size of the tempdb small.

The tempdb is also used when queries generate large intermediate result sets. Guidance and sizing the tempdb can be found on the Microsoft TechNet article Optimizing tempdb Performance.

Provisioning Services 

The Provisioning Services farm database contains static configuration and configuration logging (audit trail) data. The record size requirements outlined below can be used to help size the database:

Configuration Item

DB Space Required (KB)

Number of Items (Example)

Total (KB)

Base farm configuration

112

-

112

User group w/ farm access

50

10

250

Site

4

5

20

Device collection

10

50

500

Farm view

4

10

40

Farm view to device relationship

5

1

5,000

Site View

4

5

20

Site view to device relationship

5

1

5,000

Device

2

5,000

10,000

Device bootstrap

10

-

-

Device to disk relationship

35

1

175,000

Device printer relationship

1

-

-

Device personality data

1

-

-

Device status (when booted)

1

5,000

5,000

Device custom property

2

-

-

vDisk

1

20

20

vDisk version

3

5

300

Disk locator

10

1

200

Disk locator custom property

2

-

-

Server

5

10

50

Server IP

2

1

20

Server status (when booted)

1

20

20

Server custom property

2

-

-

vDisk store

8

5

40

vDisk store to server relationship

4

1

40

Connection to XenServer (VirtualHostingPool)

4

-

-

vDisk update task

10

10

100

Administrative change (auditing enabled)

1

10,000

10,000

Total

 

 

211,732KB (~212MB)

During the PVS farm setup, a database with an initial file size of 20MB is created. Due to the nature of the data in the PVS farm database the transaction log is not expected to grow very quickly, unless a large amount of configuration is performed.

In contrast to XenApp, which also offers the ability to track administrative changes, the related information is not written to a dedicated database but directly to the Provisioning Services farm database. In order to limit the size of the Provisioning Services database it is recommended to archive the audit trail data on a regular schedule.

Decision: Database Location

By default, the Configuration Logging and Monitoring databases are located within the Site Configuration database. Citrix recommends changing the location of these secondary databases as soon as the configuration of the site has been completed, in order to simplify sizing, maintenance and monitoring. All three databases can be hosted on the same server or on different servers. An ideal configuration would be to host the Monitoring database on a different server from the Site Configuration and Configuration Logging databases since it records more data, changes occur more frequently and the data is not considered to be as critical as the other databases. For more information, please refer to Citrix Docs – Change secondary database locations.

Remarque

The location of the Configuration Logging database cannot be changed when mandatory logging is enabled.  

Decision: High-Availability

The following table highlights the impact to XenApp, XenDesktop and Provisioning Services when there is a database outage:  

Component Impact of Database Outage

Site configuration database

Users will be unable to connect or reconnect to a virtual desktop.

Note: Local Host Cache allows users with Hosted Shared Desktops, Hosted Windows and Browser Applications, and Personal Desktops to reconnect to their applications and desktops even when the site database is unavailable.

Monitoring database

Director will not display any historical data and Studio cannot be started. Brokering of incoming user requests and existing user sessions will not be affected.

Configuration logging database

If allow changes when the database is disconnected has been enabled within XenApp and XenDesktop logging preferences, an outage of the configuration logging database will have no impact (other than configuration changes not being logged). Otherwise, administrators will be unable to make any changes to the XenApp and XenDesktop site configuration. Users are not impacted.

Provisioning Services farm database

When offline database support is enabled and the database becomes unavailable, the stream process uses a local copy of the database to retrieve information about the provisioning server and the target devices supported by the server. This allows provisioning servers and the target devices to remain operational. However, when the database is offline, the console and the management functions listed below become unavailable:

• Auto Add target devices

• vDisk creation and updates

• Active Directory password changes

• Stream process startup

• Image update service

• PowerShell and MCLI based management

If offline database support has not been enabled, all management functions become unavailable and the boot and failover of target devices will fail.

Remarque

Please review HA options for 3rd party databases (for example, App-V, SCVMM or vCenter) with the respective software vendor.

In addition to the built-in database redundancy options, Microsoft SQL Server, as well as the underlying hypervisor (in virtual environments), offer a number of high availability features. These enable administrators to ensure single server outages will have a minimal impact (if any) on the XenApp and XenDesktop infrastructure. The following the SQL / hypervisor high availability features are available: 

  • VM-level HA – This high availability option is available for virtual SQL servers only, which need to be marked for High Availability at the hypervisor layer. In case of an unexpected shutdown of the virtual machine or the underlying hypervisor host, the hypervisor will try to restart the VM immediately on a different host. While VM-level HA can minimize downtimes in power-outage scenarios, it cannot protect from operating system level corruption. This solution is less expensive than mirroring or clustering because it uses a built-in hypervisor feature. However, the automatic failover process is slower, as it can take time detect an outage and start the virtual SQL server on another host. This may interrupt the service to users. 
  • Mirroring – Database mirroring increases database availability with almost instantaneous failover. Database mirroring can be used to maintain a single standby or mirror database, for a corresponding principal or production database. Database mirroring runs with either synchronous operation in high-safety mode, or asynchronous operation in high- performance mode. In high-safety mode with automatic failover (recommended for XenDesktop) a third server instance, known as a witness, is required, which enables the mirror server to act as a hot standby server. Failover from the principal database to the mirror database happens automatically and is typically completed within a few seconds. It is a good practice to enable VM-level HA (or a similar automatic restart functionality) for at least the witness to ensure SQL service availability in case of a multi-server outage.

Remarque

Microsoft is planning to remove mirroring as a high availability option in a future release of SQL Server and is discouraging its use in new network development. Please refer to the Microsoft article – Database Mirroring (SQL Server) for more information

  • AlwaysOn Failover Cluster Instances – Failover clustering provides high-availability support for an entire instance of Microsoft SQL Server. A failover cluster is a combination of two or more nodes, or servers, using a shared storage. A Microsoft SQL Server AlwaysOn Failover Cluster Instance, introduced in SQL Server 2012, appears on the network as a single computer, but has functionality that provides failover from one node to another if the current node becomes unavailable. The transition from one node to the other node is seamless for the clients connected to the cluster. AlwaysOn Failover cluster Instances require a Windows Server Failover Clustering (WSFC) resource group. The number of nodes supported in the WSFC resource group will depend on the SQL Server edition. (Please refer to the table in the Decision: Edition earlier in this chapter.) For more information please refer to MSDN – AlwaysOn Failover Cluster Instances (SQL Server)
  • AlwaysOn Availability Groups – AlwaysOn Availability Groups is an enterprise-level highavailability and disaster recovery solution introduced in Microsoft SQL Server 2012, which enables administrators to maximize availability for one or more user databases. AlwaysOn Availability Groups require that the Microsoft SQL Server instances reside on Windows Server failover clustering (WSFC) nodes. Similar to failover clustering a single virtual IP / network name is exposed to the database users. In contrast to failover clustering, shared storage is not required since the data is transferred using a network connection. Both synchronous and asynchronous replication to one or more secondary servers is supported. As opposed to mirroring or clustering secondary servers can be actively used for processing incoming read-only requests, backups or integrity checks. This feature can be used to offload user resource enumeration requests to a secondary SQL server in XenDesktop environments to essentially scale-out a SQL server infrastructure. Since the data on active secondary servers can lag multiple seconds behind the primary server, the read-only routing feature cannot be used for other XenDesktop database requests at this point in time. For more information, please refer to MSDN – AlwaysOn Availability Groups (SQL Server).

The following table outlines the recommended high availability features for Citrix databases:  

In the table:

  • Y indicates Recommended.
  • o indicates Viable.
  • N indicated Not Supported.
  • T indicates for test environments only. 
Component VM Level - HA Mirroring AlwaysOn Failover Cluster Instances AlwaysOn Availability Groups

Site database

T

Y

o

o

Configuration logging database

T

o

o

o

Monitoring database

T

Y

o

o

Provisioning Services farm database

T

Y

o

N

DesktopPlayer database

T

N

o

o

Citrix Licensing

Citrix offers organizations the flexibility of multiple licensing models that align with common usage scenarios. The different licensing models vary based on the Citrix product used, but can include per user/device and per concurrent user. Several Citrix products use the license server, while other products require a license to be installed on the product itself.

Citrix License Server

  • XenDesktop
  • XenApp
  • Provisioning Services
  • XenServer

On the product:

  • NetScaler
  • NetScaler Gateway

For more information on XenDesktop 7.x licensing, please refer to CTX128013 - XenDesktop Licensing.

For more information on Microsoft Licensing, please refer to the Microsoft document – Licensing Microsoft’s Virtual Desktop Infrastructure Technology.  

Decision: Sizing

Internal scalability testing has shown that a single virtual license server with two cores and 2GB of RAM can issue approximately 170 licenses per second or 306,000 licenses per half hour. If necessary, the specification of the license server can be scaled out to support a higher number of license requests per second.  

Decision: High Availabiity

For a typical environment, a single license server is sufficient. Should the license server become unavailable, dependent Citrix products will enter a 30-day grace period, which provides more than enough time to resolve connectivity issues and/or restore or rebuild the license server.

Remarque

  • If the license server and the Citrix product do not communicate within 2 heartbeats (5-10 min), the Citrix product will enter a grace period and will allow connections for up to 30 days. Once communication with the license server is re-established, the license server will reconcile the temporary and actual licenses.
  • A CNAME record in DNS is a convenient way to reference the license server. Using CNAMEs allows the license server name to be changed without updating the Citrix products.

If additional redundancy is required, Citrix supports the following high availability solutions for the license server. 

  • Windows Clustering – Cluster servers are groups of computers that work together in order to increase availability. Clustering allows the license server role to automatically failover in the event of a failure. For more information on clustering, please see the Citrix Docs article – Clustered License Servers
  • Duplication of license server – Create a VM level backup of the license server. This backup should not be stored on the same host as the license server. Instead, it should be stored in a safe location, such as a highly available storage solution, or backed up to tape or disk. The duplicate server is not active, and will remain on standby until the need arises to restore the active license server. Should the license server be restored using this backup, any new licenses must be re-downloaded to the server.

For more information, please refer to Citrix eDocs – Licensing Architecture Overview.

Each method allows an administrator to exchange a single license server for another without an interruption in service; assuming that the change occurs during the grace period and that the following limitations are considered. 

  • License files will reference the server specified during the allocation process. This means that the license files can only be used on a server with the same binding information (Hostname) as the server that was previously specified. 
  • Two Windows-based, domain joined license servers cannot share the same name and be active in the environment at the same time. 
  • Because license servers do not communicate with each other, any additional licenses must be placed on both the active and backup license server.  

Decision: Optimization

License server performance can be optimized by tuning the number of “receive” and “processing” threads. If the thread count is set too low, requests will be queued until a thread becomes available. Conversely, if the thread count is set too high, the license server will become overloaded.

The optimal values are dependent on the server hardware, site configuration, and license request volume. Citrix recommends testing and evaluating different values to determine the proper configuration. Setting the maximum number of processing threads to 30 and the maximum number of receiving threads to 15 is a good starting point for large scale deployments.

This optimization will improve the Citrix License Server ‘s ability to provide licenses by increasing its ability to receive and process license requests.

For more information, please refer to the Citrix Docs – Improving Performance by Specifying Thread Use.

Delivery Controllers

Decision: Server Sizing

Delivery Controller scalability is based on CPU utilization. The more processor cores available, the more virtual desktops a controller can support. Each desktop startup, registration, enumeration and launch request impacts the controller’s processor. As the storm increases in intensity, the CPU utilization of the controller will increase. If the CPU reaches a critical threshold, roughly 80%, the site will need to either scale up or scale out.

Adding additional CPU cores to a Delivery Controller will lower the overall CPU utilization, thus allowing for greater numbers of desktops supported by a single controller. This is really only feasible when dealing with virtualized controllers as adding virtual CPUs is fairly easy and straightforward. The other alternative is to add another controller into the site configuration. The controller would have the same configuration as other controllers, and the load would be evenly distributed across all controllers, thus helping to reduce the overall load on each single controller.

Testing has shown that a single Delivery Controller, using the following configuration, can support more than 5,000 desktops.

Component Specification

Processor

4 vCPU

Memory

4 GB RAM

Network

Bonded virtual NIC

Host Storage

40-GB shared storage 

Operating System

Windows Server 2012 R2

XenDesktop

7

The following formula can be used to calculate the number of Delivery Controllers required for a Citrix site.

localized image

Decision: High Availability

If the server hosting the Delivery Controller is unavailable, users will not be able to access their virtual desktops or published applications. Therefore at least two Delivery Controllers (N+1 redundancy) should be deployed per zone on different physical servers to prevent this component from becoming a single point of failure. If one controller fails, the others can manage connections and administer the site.

The locations of all Delivery Controllers are specified on the VDA, allowing it to automatically failover if communication with one Delivery Controller is unavailable. The VDA checks the following locations, in order, stopping at the first place it finds the Delivery Controller:

  1. A persistent storage location maintained for the auto-update feature. This location contains controller information when auto-update is enabled and after the VDA successfully registers for the first time after installation.
    For its initial registration after installation, or when auto-update is disabled, the VDA checks the following locations.
  2. Policy settings (Delivery Controllers, Delivery Controller SIDs).
  3. The Delivery Controller information under the VDA ListofDDCs registry key. The VDA installer initially populates these values, based on the information specified when installing the VDA.
  4. OU-based discovery. This is a legacy method maintained for backward compatibility.
  5. The Personality.ini file created by Machine Creation Services.

Citrix Consulting recommends utilizing the auto-update feature (enabled by default). This feature will simplify management of the environment by keeping VDA’s updated when adding and removing Delivery Controllers.  

Decision: Local Host Cache

Even if the SQL database is highly available, there is the risk of not having access to the database if the network connection between the delivery controller and SQL database fails, which is an important concern for sites that span geographical locations.  

To overcome this risk, the delivery controllers can utilize the local host cache feature that creates a local copy of the SQL database, used only if the delivery controller loses contact with the database.

The following must be considered when using local host cache: 

  • Elections – When the zones loses contact with the SQL database, an election occurs nominating a single delivery controller as master. All remaining controllers go into idle mode. A simple alphabetical order determines the winner of the election. 
  • Sizing – When using local host cache mode, a single delivery controller is responsible for all VDA registrations, enumerations, launches and updates. The elected controller must have enough resources (CPU and RAM) to handle the entire load for the zone. A single controller can scale to 10,000 users, which influences the zone design.
    • RAM – The local host cache services can consume 2+GB of RAM depending on the duration of the outage and the number of user launches during the outage.
    • CPU – The local host cache can use up to 4 cores in a single socket.
    • Storage – During local host cache mode, storage space increased 1MB every 2-3 minutes with an average of 10 logons per second. 
  • Power Options – Powered off virtual resources will not start when the delivery controller is in local host cache mode. Pooled virtual desktops that reboot at the end of a session are placed into maintenance mode. 
  • Consoles – When using local host cache mode, Studio and PowerShell are not available.  

Decision: XML Service Encryption

In a typical session, the StoreFront server passes credentials to the Citrix XML Service on a Delivery Controller. The Citrix XML protocol uses clear text to exchange all data, with the exception of passwords, which are transmitted using obfuscation.

If the traffic between the Storefront servers and the XenDesktop Controllers can be intercepted it will be vulnerable to the following attacks: 

  • Attackers can intercept the XML traffic and steal resource set information and tickets. 
  • Attackers with the ability to crack the obfuscation can obtain user credentials. 
  • Attackers can impersonate the XenDesktop Controller and intercept authentication requests.

For most organizations, the Citrix XML traffic will be isolated on a dedicated physical or virtual datacenter network making interception unlikely. However, for safety consider using SSL encryption to send StoreFront data over a secure HTTP connection.  

Decision: Server OS Load Management

Default Load Management policies are applied to all Server OS delivery groups. The default settings specify the maximum number of sessions a server can host at 250 and do not consider CPU and Memory usage. Capping session count does not provide a true indication of load, which can lead to an overburdening of Server OS delivery groups resulting in a degradation of performance or an underutilization of Server OS delivery groups resulting in an inefficient usage of resources.

Citrix Consulting recommends creating unique “custom” Load Management policies for each Delivery Group based on performance and scalability testing. Different rules and thresholds can be applied to each Delivery Group depending on the different resource bottlenecks identified during testing. For more information on the available Load Management policy configurations refer to Citrix Docs – Load Management policy settings.

If adequate testing cannot be performed prior to production, Citrix Consulting recommends implementing the following “custom“ Load Management policy which can be applied to all servers as a baseline: 

  • CPU Usage - Full Load: 80% 
  • CPU usage excluded process priority – Below Normal or Low 
  • Memory Usage - Full Load: 80% 
  • Memory Usage base load – Report zero load (MBs): 786 
  • Maximum number of sessions – X

The “Maximum number of sessions” policy is included for capping purposes – this is considered a best practice for resiliency. Organizations can choose an initial value of 250 (denoted by “X‟ above). It is highly recommended that this value and others be customized based on the results from scalability testing.  

Coud Connector

The XenApp and XenDesktop Service within Citrix Cloud utilize a set of services contained within the Citrix Cloud Connector. A redundant set of Cloud Connector virtual machines must be placed in each data center/resource location containing VDA hosts.

Decision: Server Sizing

Cloud Connector scalability is based on CPU utilization. The more processor cores available, the more virtual desktops a cloud connector can support. Each desktop startup, registration, enumeration and launch request affects the cloud connector’s processor. As the storm increases in intensity, the CPU utilization of the cloud connector will increase. If the CPU reaches a critical threshold, roughly 80%, the site will need to either scale up or scale out.

Testing has shown that a single Cloud Connector Controller, using the following configuration, can support 5,000 desktops.  

Component On Premises Specifications Azure Hosted Specifications

Number of VMs (with N+1 Fault Tolerance)

3

6 Standard_A2_V2 instances

Processors per VM

4 vCPU

2 vCPU

Memory per VM

4 GB RAM

4 GB RAM

Host Storage per VM

40 GB shared storage

200 GB temp storage

Operating System

Windows Server 2012 R2

Windows Server 2012 R2

Provisioning Services

Citrix Provisioning Services (PVS) uses streaming technology to simplify the deployment of virtual and physical machines. Computers are provisioned and re-provisioned in real-time from a single shared-disk image. In doing so, administrators can completely eliminate the need to manage and patch individual systems. Instead, all image management is performed on the master image.

Decision: Topology

A Provisioning Services farm represents the top level of the Provisioning Services infrastructure, which can be further broken down into sites. All provisioning servers in a farm share the same SQL database and Citrix license server.

Each site is a logical entity containing provisioning servers, vDisk pools and target device collections. Although all sites within a farm share the same database, target devices can only fail over to other provisioning servers within the same site.

localized image

There are factors that must be considered when determining the overall Provisioning Services topology: 

  • Network – Provisioning servers are constantly communicating with the farm database to retrieve system configuration settings. Therefore, separate farms should be created for each physical location where target devices reside, unless they are connected to the database server by a fast and robust connection. 
  • Administration – Organizations may need to maintain the separation of administrative duties at a departmental, regional or countrywide basis. Additional Provisioning Services farms will add some complexity to the management of the environment. However, this overhead is typically limited to initial configuration, desktop creation and image updates. 
  • Organization – A practical reason for building multiple sites is due to organizational changes. For example, two companies may have recently merged through acquisition, but need to keep resources separate while integration takes place. Configuring the organization to use separate sites is one way to keep the businesses separate but managed centrally through the Provisioning Services console.

Only create additional sites if the business requirements warrant it. A single site per farm is easier to manage and requires no additional configuration.  

Decision: Device Collections

Device collections provide the ability to create and manage logical groups of target devices. Creating device collections simplifies device management by allowing actions to be performed at the collection level rather than the target device level.

localized image

Device collections can represent physical locations, subnet ranges, chassis or different departments within an organization. Collections can also be used to logically separate production target devices from test and maintenance ones.

Consider creating device collections based on vDisk assignment so that the status of all target devices assigned to a particular vDisk can be quickly identified.  

Decision: High Availability

Provisioning Services is a critical component of the virtual desktop infrastructure. The following recommendations should be followed to eliminate single points of failure: 

  • Provisioning Server – A minimum of two provisioning servers should always be implemented per site. Sufficient redundancy should be incorporated into the design so that a single server failure does not reduce the total number of target devices that can be supported per site. The Provisioning Services boot file should be configured for high availability. Up to four Provisioning Servers may be listed in the boot file. Target devices will try to contact the servers in the order that they are listed. The server that responds may not necessarily be the server that will provide streaming services to the target device. If Load Balancing is enabled, the target device may be reassigned to another server in the site that is less loaded than the others. 
  • vDisks and Storage – For vDisk stores hosted on local, Direct Attached Storage (DAS) or Storage Area Network (SAN), replication should be used to synchronize the vDisks. If using Network Attached Storage (NAS), ensure that the vDisks are hosted on a highly available network share. 
  • Networking – The provisioning servers should have redundant NICs. If the provisioning server is deployed as a physical server, redundant NICs should be teamed and if the provisioning server is deployed as a virtual server, the underlying hypervisor should incorporate redundant NICs.

Remarque

The target devices will only failover to NICs that are on the same subnet as the PXE boot NIC.

 

Trivial File Transfer Protocol (TFTP) is a communications protocol used for transferring configuration or boot files between machines. Provisioning services can use TFTP to deliver the bootstrap file to target devices. There are several options available to make the TFTP service highly available. Some of the more commonly used options are: 

  • DNS Round Robin – A DNS entry is created for the TFTP service with multiple A records corresponding to the TFTP services running on the provisioning servers in the farm. This method is not recommended since the state of the TFTP service is not monitored. Clients could potentially be sent to a non-functioning server.
  • Hardware load balancer – Use a hardware load balancer, such as Citrix NetScaler, to create virtual IPs that corresponds to the provisioning servers. The NetScaler can intelligently route traffic between the provisioning servers. In the event that one of the servers becomes unavailable, NetScaler will automatically stop routing TFTP requests to that server. This is the best method for making TFTP highly available, but can be complicated to setup. 
  • Multiple DHCP Option 66 entries – This method is easy to implement but requires a DHCP service that supports entering multiple entries in option 66. Microsoft DHCP server allows one option 66 entry so this method would not be feasible in environments with Microsoft DHCP services. If using a non-Microsoft DHCP server or appliance, check with the manufacturer to verify that multiple option 66 entries is supported.

There are other options available that can achieve the same result without having to use TFTP: 

  • Proxy DHCP – Use the provisioning servers PXE service to provide the bootstrap information. If one of the servers is down, the next available server in the farm can provide the bootstrap information. This method requires the provisioning servers to be on the same broadcast domain as the target devices. If there are other PXE services running on the network (Altiris, SCCM, etc.) then multiple VLANs may be required to keep the PXE services from interfering with each other. 
  • Boot Device Manager – Use the Boot Device Manager to create a bootstrap file that is either placed on the local hard drive, or used as a bootable ISO file. If the ISO file is used, configure the target devices to boot from the CD/DVD-ROM drive, and place the ISO file on a highly available shared network location or local storage of each target device. When either method is utilized, the TFTP service is not used at all.

High availability should always be incorporated into the Provisioning Services design. Although high availability may require additional resources and increased costs, it will provide a highly stable environment so that users experience minimal impact due to service outages.  

Decision: Bootstrap Delivery

A target device initiates the boot process by first loading a bootstrap program which initializes the streaming session between the target device and the provisioning server. There are three methods in which the target device can receive the bootstrap program: 

Using DHCP Options

  1. When the target device boots, the target device sends a broadcast for IP address and boot information. DHCP will process this request and provide an IP as well as scope option settings 66 (the name or IP address of the Provisioning Services TFTP server) and 67 (the name of the bootstrap file).

Remarque

If using a load balancer for the TFTP service then the address of the load balancer is entered in option 66.

  2. Using TFTP, a request for the bootstrap file is sent from the target device to the provisioning server. The target device downloads the boot file from the provisioning server.

  3. The target device boots the assigned vDisk image.

Remarque

Requires UDP/DHCP Helper to be configured when targets are not on the same subnet as the DHCP servers in order to receive PXE broadcasts.

Using PXE Broadcasts

1. When a target device boots from the network, the target device sends a broadcast for an IP address and boot information. DHCP will process this request and provide an IP address. In addition, all provisioning servers that receive the broadcast will return boot server and boot file name information. The target device will merge the information received and start the boot process.

2. Using TFTP, a request for the bootstrap file is sent from the target device to the provisioning server which responded first. The target device downloads the boot file from the provisioning server.

Remarque

  • Make sure no other PXE services are in use on the same subnet, such as the Altiris PXE service, or isolate using VLANs otherwise conflicts may occur with Provisioning Services.
  • Requires UDP/DHCP Helper to be configured when targets are not on the same subnet as the DHCP and PVS servers in order to receive PXE broadcasts. 

Using Boot Device Manager – The Boot Device Manager (BDM) creates a boot file that target devices obtain through a physical CD/DVD, a mounted ISO image or as a virtual hard disk assigned to the target device. A BDM partition can be upgraded in one of three ways:

  • By collection
  • By a group of highlighted devices
  • By a single device 

A summary of the advantages and disadvantages for each delivery method is listed in the following table:

Delivery Method Advantages Disadvantages

DHCP Options

Easy to implement

Requires changes to production

DHCP service.

DHCP service may only allow one option 66 entry.

Requires UDP/DHCP helper for targets on different subnets. 

PXE

Easy to implement

Can interfere with other running PXE services on the same subnet.

Requires UDP/DHCP helper for targets on different subnets. 

BDM ISO

Does not require PXE or TFTP services

Extra effort required to boot physical target devices.

BDM ISO is regarded as a single point of failure if a single file is used. 

BDM Partition

The BDM boot partition upgrade does not require PXE, TFTP, or TSB; it's considered a single stage bootloader, at boot time it automatically finds all relevant PVS server information and does not need external services provided by PXE, TFTP, or TSB. 

An extra 8MB partition is created

for each target device. 

Remarque

When configuring the bootstrap file, up to four provisioning servers may be listed. The order in which the provisioning servers appear in the list determines the order which the provisioning servers are accessed. If the first server does not respond, the next server in the list is contacted. 

Decision: vDisk Format

Provisioning Services supports the use of fixed-size or dynamic vDisks: 

  • Fixed-size disk – For vDisks in private mode, fixed-size prevents disk fragmentation, and offers improved write performance over dynamic disks. 
  • Dynamic disk – Dynamic disks require less storage space than fixed-size disks, but offer significantly lower write performance. Although vDisks in Shared mode do not perform writes to the vDisk, the time required to complete vDisk merge operations will increase with dynamic disks. This is not a common occurrence as more environments choose to create new vDisks when updating.

Since most reads will be to the System Cache in RAM, there is no significant change in performance when utilizing fixed-size or dynamic disks. In addition, dynamic disks require significantly less storage space. Therefore, dynamic disks are recommended.  

Decision: vDisk Replication

vDisks hosted on a local, Direct Attached Storage or a SAN must be replicated between vDisk stores whenever a vDisk is created or changed. Provisioning Services supports the replication of vDisks from stores that are local to the provisioning server as well as replication across multiple sites that use shared storage. The replication of vDisks can be performed manually or automatically: 

  • Manual – Manual replication is simple, but can be time consuming, depending on the number of vDisks and vDisk stores. If an error occurs during the replication process, administrators can catch them straight away and take the appropriate steps to resolve them. The risk of manual replication is vDisk inconsistency across the provisioning servers which will result in load balancing and failover to not work properly. For example, if a vDisk is replicated across three servers and then one of the vDisks is updated, that vDisk is no longer identical and will not be considered if a server failover occurs. Even if the same update is made to the other two vDisks, the timestamps on each will differ, and therefore the vDisks are no longer identical. 
  • Automated – For large environments, automated replication is faster than the manual method due to the number of vDisks and vDisk Stores required. Some automated tools, such as Microsoft DFS-R, support bandwidth throttling and Cross File Remote Differential Compression (CF-RDC), which use heuristics to determine whether destination files are similar to the file being replicated. If so, CF-RDC will use blocks from these files to minimize the amount of data transferred over the network. The risk of automated replication is that administrator do not typically monitor replication events in real-time and do not respond quickly when errors occur, unless the automation tool has an alerting feature. Some tools can be configured to automatically restart the copy process in the event of a failure. For example, Robocopy supports “resume copying” in the event that the network connection is interrupted.

For medium and large projects, use a tool to automate vDisk replication. Select a tool that is capable of resuming from network interruptions, copying file attributes and preserving the original timestamp.

Remarque

Load balancing and high availability will not work unless the vDisks have identical timestamps.

Decision: Server Sizing

Generally, a Provisioning Server is defined with the following specifications:

Component Specification

Model

Virtual

Processor

4 - 8 vCPU

Memory

2GB + (# of vDisks * 2GB)

Network

10 GBps NIC

Network

40 GB shared storage

vDisk Storage

Depending on number of images/revisions

Operating System

Windows Server 2012 R2

Model

Citrix Provisioning Services can be installed on virtual or physical servers: 

  • Virtual – Offers rapid server provisioning, snapshots for quick recovery or rollback scenarios and the ability to adjust server resources on the fly. Virtual provisioning servers allow target devices to be distributed across more servers helping to reduce the impact from server failure. Virtualization makes more efficient use of system resources. 
  • Physical – Offers higher levels of scalability per server than virtual servers. Physical provisioning servers mitigate the risks associated with virtual machines competing for underlying hypervisor resources. 
     
    In general, virtual provisioning servers are preferred when sufficient processor, memory, disk and networking resources can be made available and guaranteed to be available.

Remarque

For high availability, ensure that virtual Provisioning Servers are distributed across multiple virtualization hosts. Distributing the virtual servers across multiple hosts will eliminate a single point of failure and not bring down the entire Provisioning Services farm in the event of a host failure.

CPU

Provisioning Services is not CPU intensive. However, under allocating the number of CPUs does impact the optimization of the network streams. The number of streams that a Provisioning Services server can run concurrently can be determined by the following formula:

𝑀𝑎𝑥 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑆𝑡𝑟𝑒𝑎𝑚𝑠 = # 𝑜𝑓 𝑃𝑜𝑟𝑡𝑠 ∗ # 𝑜𝑓 𝑇ℎ𝑟𝑒𝑎𝑑𝑠/𝑃𝑜𝑟𝑡

By default, the Streaming Service is configured with 20 sequential network ports, and 8 threads per port. Therefore, by default, a provisioning server can support 160 concurrent targets. If more than 160 streams are required, Provisioning Services continuously switches between streaming different target devices

Ideally, if the environment needs to support more than 160 concurrent targets, the number of ports, and threads per port can be adjusted in the Provisioning Services console. Best performance is attained when the threads per port is not greater than the number of cores available on the provisioning server. If the provisioning server does not have sufficient cores, the server will show a higher CPU utilization, and target devices waiting for requests to be processed will have a higher read latency. 

Even though Provisioning Services is not CPU intensive, allocating 2 CPUs will require a larger contiguous network port range.  

  • Small environments (up to approximately 500 virtual machines) 4 vCPUs are recommended. 
  • Larger environments 8 vCPUs are recommended.

RAM

The Windows operating system hosting Provisioning Services partially caches the vDisks in memory (system cache) reducing the number of reads required from storage. Reading from storage is significantly slower than reading from memory. Therefore, Provisioning Servers should be allocated sufficient memory to maximize the benefit from this caching process.

The following formula can be used to determine the optimal amount of memory that should be allocated to a provisioning server:

𝑇𝑜𝑡𝑎𝑙 𝑆𝑒𝑟𝑣𝑒𝑟 𝑅𝐴𝑀 = 2𝐺𝐵 + (# 𝑜𝑓 𝑣𝐷𝑖𝑠𝑘𝑠 ∗ 2𝐺𝐵)

Network

Unlike most other XenApp and XenDesktop components, Provisioning Services does not bottleneck the CPU.  Provisioning Services scalability is based on network throughput. 

The following table shows the approximate amount of data that Provisioning Services requires to boot different operating systems:

Operating system Avg Boot Data Usage (MB)

Windows 10 x64

240

Windows 8 x86

178

Windows 8 x64

227

Windows 7 x86

166

Windows 7 x64 

210

Windows 2012

225

Windows 2012 R2

232

Windows 2008 R2

251

Windows Vista x86

190

Windows Vista x64

240

Determining how much time will be required to boot the target devices can be estimated using the following formula:

localized image
Operating System Number of VMs Network Throughtput Time to boot

Windows 10 x64

500

1 GBps

960 Seconds (16 minutes) 

Windows 10 x64

500

10 GBps

96 Seconds
(1 minute, 36 seconds) 

A 10Gbps network is recommended for use with Provisioning Services. If a 10Gbps network is not available, consider link aggregation to provide additional bandwidth to the provisioning servers, or a dedicated physical streaming network.

Conseil

Firewalls can add latency and create bandwidth bottlenecks in Provisioning Services environments. If the use of firewalls cannot be avoided, refer to the Citrix whitepaper CTX101810 – Communication Ports Used By Citrix Technologies, for the list of ports that should be enabled for full functionality.

Growth

As the farm grows, administrators will need to decide whether to add more resources to the provisioning servers or to add more provisioning servers to the farm.

There are a number of environmental factors that need to be considered when determining whether the Provisioning Servers should be scaled up or scaled out: 

  • Redundancy – Spreading user load across additional less-powerful servers helps reduce the number of users affected from a single provisioning server failure. If the business is unable to accept the loss of a single high-specification server, consider scaling out.
  • Failover times – The more target devices connected to a single provisioning server, the longer it will take for them to failover in the event that the server fails. Consider scaling out to reduce the time required for target devices to failover to another server. 
  • Data center capacity – The data center may have limited space, power and/or cooling available. In this situation, consider scaling up. 
  • Hardware costs – Initially, it may be more cost effective to scale up. However, there will be a point where scaling out actually becomes more cost effective. A cost analysis should be performed to make that determination.  
  • Hosting costs – There may be hosting and/or maintenance costs based on the number of physical servers used. If so, consider scaling up to reduce the long-term cost of these overheads.

Decision: Network Configuration

As mentioned before it is essential that the network is sized correctly to prevent network bottlenecks causing high disk access times and directly affecting virtual desktop performance. The following diagram outlines a common Provisioning Services network infrastructure:

localized image

The following network configuration is recommended for the network sections outline within the diagram:

  • PVS Uplink – All disk access from the target devices will be transferred via the PVS network uplink. This means hundreds or even thousands of devices will use this network connection. Therefore, it is vital that this connection is redundant and can failover without any downtime. Furthermore, Citrix recommends a minimum bandwidth of 1Gbps per 500 target devices. For virtual provisioning servers a respective QoS quota or a dedicated physical network uplink should be configured to ensure best performance.
  • Hypervisor Uplink – Used by all PVS target devices hosted on a particular hypervisor host. Therefore, redundancy with transparent failover is strongly recommended. Unless the target devices run a very I/O intensive workload or perform I/O intensive tasks (e.g. booting) simultaneously, a bandwidth of 1Gbps is sufficient for this uplink.
  • VM Uplink – All network traffic for a virtual machine, including PVS streaming traffic, will traverse this virtual network connection. Unless the workload is extremely I/O intensive a bandwidth of 100 Mbps is sufficient to handle even peak loads during I/O intensive tasks, such as booting from vDisk. For example, a Windows 2012 R2 Server will read approximately 232MB during a period of 90 seconds from the vDisk until the Windows Logon Screen is shown. During this period an average data rate of 20.5 Mbps with peaks up to 90 Mbps can be observed.

The following switch settings are recommended for Provisioning Services:

  • Disable Spanning Tree or Enable PortFast – In a switching environment the Spanning Tree Protocol (STP) places ports into a blocked state while it transmits Bridged Protocol Data Units (BPDUs) and listens to ensure the BPDUs are not in a loopback configuration. The port is not placed in a forwarding state until the network converges, which depending on the size of the network, may incur enough time to cause Preboot Execution Environment (PXE) timeouts. To eliminate this issue, disable STP on edgeports connected to clients or enable PortFast.
  • Storm Control - Storm Control is a feature available on Cisco switches that allows a threshold to be set whereby, multicast, broadcast, or unicast traffic may be suppressed. Its purpose is to prevent malicious or erroneous senders from flooding a LAN and affecting network performance. PVS Servers may send a large amount of traffic by design that falls within a storm control threshold, therefore the feature should be configured accordingly.
  • Broadcast Helper – The broadcast helper is required to direct broadcasts from clients to servers that would otherwise not be routed. In a PVS environment it is necessary to forward PXE boot requests when clients are not on the same subnet as the servers. If possible the recommended network design is to have PVS servers residing on the same subnet as the target devices. This mitigates the risk of any service degradation due to other networking infrastructure components.

The following network interface features should be taken into consideration when selecting a network interface for Provisioning Services:

  • TCP Offloading – Offloading I/O tasks to the network interface reduces CPU usage and improves overall system performance, however, PVS Streaming Services can be negatively impacted when Large Send Offload is enabled due to the extra work placed on the network adapter. Many network adapters will have Large Send Offload and TCP checksum offload enabled by default.

Remarque

If Large Send Offload is enabled and the switch that the traffic is passing through does not support the frame size sent by the Large Send Offload engine, the switch will drop the frame causing data retransmission. When retransmitting, the operating system will segment the frames instead of the network adapter, which can lead to severe performance degradation.

  • Receive Side Scaling (RSS) – Receive side scaling enables packets received from a network adapter to be balanced across multiple CPUs which allows incoming TCP connections to be load balanced, preventing bottlenecks from occurring to a single CPU. In Windows Server 2008 R2 and Windows Server 2012/2012 R2, RSS is enabled by default.

Remarque

For more information on PVS networking best practices please refer to Best Practices for Configuring Provisioning Services Server on a Network.

For Provisioning Services implementations on low bandwidth networks (1Gbps or slower), performance may be improved by isolating streaming traffic from other network traffic on the LAN.

Microsoft does not support NIC teaming with Hyper-V on Windows Server 2008 R2; however, third party solutions are available. Microsoft does support NIC teaming with Hyper-V on Windows Server 2012/2012 R2. All support queries regarding teaming with Hyper-V should be directed to the NIC OEM.  

Decision: Subnet Affinity

The Provisioning Services Subnet Affinity is a load balancing algorithm that helps to ensure target devices are connected to the most appropriate provisioning server. When configuring subnet affinity, the following options are available:

  • None – Ignore subnets; uses the least busy server.
  • Best Effort – Uses the least busy server/NIC combination from within the same subnet. If no server/NIC combination is available within the subnet, select the least busy server from outside the subnet. If more than one server is available within the selected subnet, perform load balancing between those servers. This is the default setting.
  • Fixed – Use the least busy server/NIC combination from within the same subnet. Perform load balancing between servers within that subnet. If no server/NIC combination exists in the same subnet, do not boot target devices assigned to this vDisk.

The following examples show common network configurations for physical provisioning servers. Similar configurations can be implemented for virtual provisioning servers without compromising on performance or functionality.

Blade Design

The provisioning servers and the target devices that they support reside within the same chassis. In most cases, the chassis will have a dedicated 10Gbps switch shared among all blade servers within the chassis.

localized image

The “Best Effort” subnet affinity option is used to keep Provisioning Services traffic within the same chassis. Should the provisioning server become unavailable, the targets will failover to the second provisioning server in the second chassis, but same Provisioning Services site.

Rack Design

The second example is based on a rack design that uses rack switches to keep the provisioning traffic within the rack.

localized image

As opposed to the blade chassis design, the subnet affinity feature is not used. Instead a Provisioning Services site with two provisioning servers will be configured per server rack. This will ensure that the target devices are streamed from provisioning servers within the same rack.

Experience from the Field

Manufacturing – A manufacturing company is designing a Provisioning Services solution to support five thousand virtual desktops. The company has concerns that Provisioning Services streaming traffic will create a bottleneck on the network affecting other applications. The company chose to build the environment on blade servers so that provisioning traffic is contained within the blade enclosure and will not impact other traffic on the network. 

Decision: Read Cache

PVS Accelerator enables a PVS proxy to reside in the XenServer's Control Domain on a host where streaming of a Provisioning Services vDisk is cached at the proxy before being forwarded to the virtual machine. Using the cache, subsequent booting (or any I/O requests) of the virtual machine on the same host can be streamed from the proxy rather than streaming from the server over the network. PVS Accelerator requires more local resources on the XenServer host, but streaming from the server over the network saves resources, effectively improving performance.

localized image

PVS Accelerator is a XenServer only capability. Utilizing this integrated technology reduces the load on the PVS server, reduces the overall network utilization and reduces the time it takes to boot a virtual machine. 

localized image

For more information on the relationship among XenServer and Provisioning Services, see the blog XenServer and PVS: Better Together.

Decision: Write Cache

Because the master image is read-only, each virtual machine has a writable disk to store all changes.  The administrator must decide where to store the write cache disk.

PVS Server – Local Storage

The Provisioning Services local storage holds the write cache drives for each target virtual machine. Although this is the default setting, it does increase network bandwidth requirements and increases the utilization of the Provisioning Services server. 

localized image

PVS Server – Shared Storage 

Shared storage associated with the Provisioning Services server holds the write cache drives for each target virtual machine. This option does increase network bandwidth requirements and increases the utilization of the Provisioning Services server. It also places temporary data (write cache) on expensive shared storage. 

localized image

VM – Local Storage

Local storage associated with the virtual machine holds the write cache drives for each target virtual machine. This option uses low cost local storage and does not consume additional resources on the Provisioning Services server. However, the local storage must be capable of support the IOPS of all virtual machines on the host. 

localized image

VM – Cache in RAM 

RAM associated with the virtual machine holds the write cache drives for each target virtual machine. This option provides high performance due to the speed of RAM.  However, if the RAM cache runs out of space, the virtual machine will become unusable. In order to use this option, significant amounts of RAM must be allocated to each virtual machine, increasing the overall cost. 

localized image

VM – Cache in RAM with Overflow to Disk

A combination of RAM and local storage is used for the write cache. First, writes are stored within the RAM cache, providing high performance. As the RAM cache is consumed, large blocks are removed from the RAM cache and placed onto the local storage write cache disk. This option provides high-levels of performance with the low cost of local storage. 

Utilizing this integrated technology reduces write IOPS by 95%.

localized image

Cache in RAM with Overflow to Disk is the recommended option.

localized image

Decision: Antivirus

By default, most antivirus products scan all files and processes, which has a significant impact on Provisioning Services performance. For details on how antivirus software can be optimized for Provisioning Services, please refer to CTX124185 – Provisioning Services Antivirus Best Practices.

Antivirus software can cause file-locking issues on provisioning servers. The vDisk Store and write cache should be excluded from antivirus scans in order to prevent file contention issues.

When a virtual disk is running in standard mode and needs to be restarted, it downloads all of the previously loaded virus definitions. This can cause performance degradation when restarting several target devices at a time, often causing network congestion while the operation persists. In extreme cases, the target device and provisioning server can become sluggish and consume more resources than necessary. If the antivirus software supports it, definition files should be redirected to the write cache drive so that they are preserved between reboots.

Machine Creation Services

Machine Creation Services (MCS) uses disk-cloning technology to simplify the deployment of virtual machines. Computers are provisioned and re-provisioned in real-time from a single shared-disk image. In doing so, administrators can eliminate the need to manage and patch individual systems. Instead, administrators perform all image management on the master image.

Decision: Storage Location

Machine Creation Services allows administrators to break up a virtual desktop into multiple components and store those pieces on different storage arrays.  

Shared Storage 

The first option utilizes shared storage for the operating system disk and the differencing disk. 

localized image

Although this option allows the sharing of the master image across multiple hypervisor hosts, it puts more strain on the storage array because it must also host the differencing disk, which is temporary data. 

Hybrid Storage

The second option uses shared storage for the operating system disk and local hypervisor storage for the differencing disk.

localized image

This is the most common option giving the administrator the benefits of sharing of the master image across multiple hypervisor hosts while offloading expensive, temporary write IOPS to cheap, local hypervisor storage. 

XenServer IntelliCache Storage

The third option uses shared storage for the operating system disk and local hypervisor storage for the differencing disk and local XenServer storage for a local cache of the operating system disk. 

This is only an option for XenServer implementations.  It provides the same value as the hybrid storage approach while also reducing read IOPS from shared storage.  IntelliCache can coexist with the XenServer RAM-based read cache, if XenServer RAM is limited.

localized image

Decision: Cloning Type

Machine Creation Services incorporates two types of cloning techniques.   

  • Thin - Every VM within the catalog utilizes a single, read-only virtual disk for all reads.  A second virtual disk, unique for each VM, captures all write IO activity. 
  • Full – Every VM within the catalog receives a full copy of the master disk image. Each VM fully owns the disk, allowing for read/write activity. Full cloning technology is only available for personal virtual desktops, where a dedicated virtual machine saves all changes to a local disk. 

Administrators should consider the following when deciding between thin and full cloning technologies:

Thin Clone Full Clone

Storage space requirements

Has greatest storage space savings.

A single master disk image is shared across multiple VMs. Only the differencing disk (writes) consume space, which continues to grow until the VM reboots 

High storage space requirements

Each VM receives a full copy of the master image. The size continues to grow as changes are made to the VM. 

Backup/Restore

Difficult

Many 3rd party Backup/DR solutions do not support snapshot/delta disks, making thin provisioned VMs hard/impossible to backup or move to other storage arrays. 

Easy

The VM exists within a single virtual disk, making it easy to backup and restore. 

Provisioning Speed

Fast
Only requires a single disk image 

Slow (can be mitigated)

Each VM requires a full copy of the master image. Storage optimization technologies can help mitigate. 

Performance

Slower

A read I/O can occur twice, one for master disk and one for differencing disk, increasing read IOPS. 

Faster
All read/write direct to a single disk. 

Boot Storm

High Impact

In a boot storm, all differencing disks re- size to hold all writes from Windows start up; placing a high load on the storage as it happens all at once. 

Low Impact 

Decision: Read Cache

During boot and logon, virtual desktops incur high levels of storage read IOPS, which can put a strain on the underlying storage subsystem. When deployed on Citrix XenServer, Shared and Pooled VDI modes utilize a RAM-based read cache hosted on each XenServer. 

localized image

Utilizing this integrated technology reduces read IOPS by 50-80%.

localized image

Decision: Write Cache

During steady state, virtual desktops incur high levels of storage write IOPS, which can put a strain on the underlying storage subsystem. Shared and Pooled VDI modes can utilize a RAMbased write cache by using non-paged pool RAM from the virtual machines operating system. 

localized image

Utilizing this integrated technology reduces write IOPS by 95%.

localized image

Security

Depending on the requirements of the organization, different security standards should be implemented within the solution.  It is advisable refer to the following papers: