Jump to content
Welcome to our new Citrix community!

Reference Architecture: Citrix DaaS - Azure

  • Contributed By: Gerhard Krenn Special Thanks To: Nitin Mehta

Introduction

This guide assists with the Architecture and deployment model of Citrix DaaS on Microsoft Azure.

The combination of Citrix Cloud and Microsoft Azure makes it possible to spin up new Citrix virtual resources with greater agility and elasticity, adjusting usage as requirements change. Virtual Machines on Azure support all the control and workload components required for a Citrix DaaS deployment. Citrix Cloud and Microsoft Azure have common control plane integrations that establish identity, governance, and security for global operations.

This document also provides guidance on prerequisites, architecture design considerations, and deployment guidance for customer environments. The document highlights the design decisions and deployment considerations across the following five key architectural principles:

  • Operations - Operations includes a wide variety of topics such as image management, service monitoring, business continuity, support, and others. Various tools are available to assist with automation of operations including Azure PowerShell, Azure CLI, ARM Templates, and Azure API.

  • Identity - One of the cornerstones of the entire picture of Azure is the identity of a person and their role-based access (RBAC). Azure identity is managed through Azure Active Directory (Azure AD) and Azure AD Domain Services. The customer must decide which way to go for its identity integration.

  • Governance - The key to governance is establishing the policies, processes, and procedures associated with the planning, architecture, acquisition, deployment, and operational management of Azure resources.

  • Security - Azure provides a wide array of configurable security options and the ability to control them so that customers can customize security to meet the unique requirements of their organization's deployments. This section helps to understand how Azure security capabilities can help you fulfill these requirements.

  • Connectivity - Connecting Azure virtual networks with the customer's local/cloud network is referred to as hybrid networking. This section explains the options for network connectivity and network service routing.

Planning

The three most common scenarios for delivering Citrix Apps and Desktops through Azure are:

  • Greenfield deployment with Citrix Cloud delivering resource locations in Azure. This scenario is delivered via the Citrix DaaS and used when customers prefer to go to a subscription model and outsource control plane infrastructure to Citrix.
  • Extending an on-premises deployment into Azure. In this scenario, the customer has a current on-premises control layer and would like to add Azure as a Citrix resource location for new deployments or migration.
  • Lift and shift. With this scenario, customers deploy their Citrix Management infrastructure into Azure and treat Azure as a site, using Citrix ADC and StoreFront to aggregate resources from multiple sites.

This document focuses on the Citrix Cloud deployment model. Customers can plan and adopt these services based on their organization needs:

Citrix DaaS

Citrix DaaS simplifies the delivery and management of Citrix technologies, helping customers to extend existing on-premises software deployments or move 100 percent to the cloud. Deliver secure access to Windows, Linux, and Web apps and Windows and Linux virtual desktops. Manage apps and desktops centrally across multiple resource locations while maintaining a great end-user experience.

Conceptual Reference Architecture

This conceptual architecture provides common guidelines for deployment of a Citrix Cloud resource location in Azure which will be discussed in the following sections.

reference-architectures_virtual-apps-and-desktops-azure_001.png

Diagram-1: Citrix Cloud Conceptual Reference Architecture

Refer to the design guide on the scalability and economics of delivering Citrix DaaS on Microsoft Azure

Operations

In the operations subject area, this guide dives deeper into planning for the workspace environment requirements and hierarchy for foundational services. At the top layer, is found the subscription, resource group, and regional design considerations. Followed by common questions for VM storage, user profile storage, and Master Image management/provisioning. Also provided is guidance on Reserved instance optimization with Autoscale and planning for Business Continuity/Disaster Recovery.

Naming Conventions

The naming of resources in Microsoft Azure is important because:

  • Most resources cannot be renamed after creation
  • Specific resource types have different naming requirements
  • Consistent naming conventions make resources easier to locate and can indicate the role of a resource

The key to success with naming conventions is establishing and following them across your applications and organizations.

When naming Azure subscriptions, verbose names make understanding the context and purpose of each subscription clear. Following a naming convention can improve clarity when working in an environment with many subscriptions.

A recommended pattern for naming subscriptions is:

Variable Example Description
[System] CTX (Citrix), CORE (Azure) Three letter identifier for the product, application, or service that the resource supports.
[Role] XAW (XenApp Workers), VDA (Virtual Delivery Agent), CC (Cloud Connector), CVA (Citrix Virtual Apps) Three letter identifier for a subsystem of the service.
[Environment] D, T, P (dev, test, or prod) Identifies the environment for the resource
## 01, 02 For resources that have more than one named instance (web servers, and so on).
[Location] WU (West US), EU (East US), SCU (South Central US) Identifies the Azure region into which the resource is deployed

When naming resources in Azure use common prefixes or suffixes to identify the type and context of the resource. While all the information about type, metadata, context, is available programmatically, applying common affixes simplifies visual identification. When incorporating affixes into your naming convention, it is important to clearly specify whether the affix is at the beginning of the name (prefix) or at the end (suffix).

A well-defined naming scheme identifies the system, role, environment, instance count, and location of an Azure resource. Naming can be enforced using an Azure Policy.

Service Scope Suggested Pattern Example
Subscriptions Global [System][Environment]##[Location]-sub WSCD01scu-sub
Resource Groups Global [System]-[Role]-[Environment]##-[Location]-rg CTX-Apps-P01-CUS-rg
Virtual Network Resource Group [System][Environment]##[Location]-vnet CTXP01cus-vnet
Subnet Parent VNET [Descriptive Context] DMZ - 10.0.1.0/24 Infrastructure - 10.0.2.0/24
Storage Account Resource Group [System][Role][Environment]##[Location] Note: Must be lower case alphanumeric ctxinfd01scu
Container Storage Account [Descriptive Context] vhds
Virtual Machine Resource Group [System][Role][Environment]##[Location] Note: Must be 15 characters or less. CTXSTFD01scu
Network Interface Resource Group [vmname]-nic# CTXSTFD01scu-nic1
Public IPs Resource Group [vmname]-pip CTXSTFD01scu-pip
Virtual Network Gateway Virtual Network [System][Environment]##[Location]-vng WSCD01scu-vng
Local Network Gateway Resource Group [System][Environment]##[Location]-lng WSCD01scu-lng
Availability Sets Resource Group [System][Role]-as CTXSTF-as
Load Balancer Resource Group [System][Role]-lb CTXNSG-lb
Workspaces Subscription [System][Environment]-analytics CTXP-analytics
Tags Resource [Descriptive Context] Finance
Key Vault Subscription [System][Environment]-vault CTXP-vault

Subscriptions

Selecting a subscription model is a complex decision that involves understanding the growth of the customer's Azure footprint within and outside the Citrix deployment. Even if the Citrix deployment is small, the customer might still have a large amount of other resources that are reading/writing heavily against the Azure API, which can have a negative impact on the Citrix environment. The reverse is also true, where many Citrix resources can consume an inordinate number of the available API calls, reducing availability for other resources within the subscription.

Single Subscription workspace model

In a single subscription model, all core infrastructure and Citrix infrastructure are located in the same subscription. This is the configuration recommended for deployments that require up to 2,500 Citrix VDAs (can be session, pooled VDI, or persistent VDI). The limits are subject to change, check the following for most up to date VDA limits. Refer to the following blog for the latest start-shutdown scale numbers within a single subscription,

Diagram-2: Azure Single Subscription workspace model

reference-architectures_virtual-apps-and-desktops-azure_002.png

Multi-Subscription workspace model

In this model, core infrastructure and Citrix infrastructure are in separate subscriptions to manage the scalability in large deployments. Often enterprise deployments with multi-region infrastructure designs are broken into multiple subscriptions to prevent reaching Azure subscription limits.

reference-architectures_virtual-apps-and-desktops-azure_003.png

Diagram-3: Azure Multi-Subscription workspace model

The following questions provide guidance to help customer's understand the Azure subscription options and plan their resources.

Component Requirement
Will the Azure subscription contain only Citrix resources? Determine if the Azure subscription will be used for dedicated Citrix resources or if the Citrix resources will be shared with other systems.
Single or Multiple subscription deployment? Typically, multiple subscription deployments are for larger deployments where single subscription limitations are an issue and more granular security controls are necessary.
What Azure Limits are likely to be reached? How many resources are in a resource Group? Resource Groups has limits and Machine Creation Services (MCS) requires either 2 or 3 disks per VM resource. Review Azure subscription limits while planning the solution.
What permissions are necessary for the Citrix Virtual Apps and Desktops service principle on the Azure subscription? Citrix DaaS requires the creation of resource groups and resources within the subscription. For example, when the service principle cannot be granted full access to a subscription, then it needs to be granted Contributor access to a pre-created resource group.
Will Development and Test environments be created in separate subscriptions from Production? Isolating Development and Test subscriptions from Production enables the application and change of global Azure services in an isolated environment and silos resource utilization. This practice has benefits for security, compliance, and subscription performance. Creating separate subscriptions for these environments does add complexity to image management. Consider These trade-offs based on the customer's needs.

Azure Regions

An Azure region is a set of data centers deployed within a latency-defined perimeter and connected through a dedicated regional low-latency network. Azure gives customers the flexibility to deploy applications where they need to. Azure is generally available in 59 regions around the world, with plans announced for 19 more regions as of the end of 2022.

A geography is a discrete market, typically containing two or more Azure regions, that preserve data residency and compliance boundaries. Geographies allow customers with specific data-residency and compliance needs to keep their data and applications close.

Availability Zones are physically separate locations within an Azure region. Each Availability Zone is made up of one or more data centers equipped with independent power, cooling and networking. Availability Zones allow customers to run mission-critical applications with high availability and low-latency replication. To ensure resiliency, there's a minimum of three separate zones in all enabled regions.

Consider these factors when choosing your region.

Component Requirement
Compliance and data residency Do customers have specific compliance or data-residency requirements? Microsoft can copy customer data between Regions within a given Geo for data redundancy or other operational purposes. For example, Azure Globally Redundant Storage (GRS) replicates Blob and Table data between two regions within the same Geo for enhanced data durability if there is a major data center disaster. Certain Azure services do not enable the customer to specify the region where the service will be deployed. These services can store customer data in any of Microsoft's data centers unless specified. Review the Azure Regions map website for the latest updates.
Service availability Review service availability within the tentative regions. Service Availability by region helps the customer to determine which services are available within a region. While an Azure Service can be supported in a given region, not all Service features are available in sovereign clouds, such as Azure Government, Germany, and China.
Determine the target Azure regions for the Citrix deployment. Review the proximity of Azure region to users and customer data centers.
Are multiple Azure regions required? Multiple Azure regions are typically considered for the following high-level reasons: - Proximity to application data or end users - Geographic Redundancy for Business Continuity and Disaster Recovery - Azure Feature or Service availability

Availability Sets

An Availability Set is a logical grouping capability that can be used in Azure to ensure that the VM resources placed within an Availability Set are isolated from each other when they are deployed within an Azure data center. Azure ensures that the VMs placed within an Availability Set run across multiple physical servers, compute racks, storage units, and network switches. If a hardware or Azure software failure occurs, only a subset of your VMs is impacted, and the overall application stays up and remains available to customers. Availability Sets are an essential capability when customers want to build reliable cloud solutions.

Each component of a Citrix deployment is in its own Availability Set to maximize overall availability for Citrix. For example, Cloud Connectors use a separate Availability Set, another for Citrix Application Delivery Controllers (ADC), StoreFront, and so forth.

Once availability sets are optimized, the next step is to build resiliency around VM downtime within the availability sets. That minimizes/eliminates service downtime when VMs are restarted or redeployed by Microsoft. This can be expanded to planned maintenance events as well. There are two features that you can use which can increase the reliability of the overall service.

These two features do not protect against unplanned maintenance/crashes.

  • Azure Planned Maintenance
  • Azure Scheduled Events

Azure Planned Maintenance

Azure periodically does updates to improve the reliability, performance, and security of the host infrastructure in Azure. If maintenance requires a reboot, Microsoft sends a notice. Using Azure Planned Maintenance, it is possible to capture these notices and proactively take action on them on the customer's schedule, instead of on Microsoft's schedule.

Make use of the planned maintenance feature by sending email notifications to the service owner of each tier (for manual intervention) and build runbooks to automate the service protection.

Azure Scheduled Events

Azure Scheduled Events is an Azure Metadata Service that gives notices programmatically to applications to alert of immediate maintenance. It provides information about upcoming maintenance events (for example reboot) so the application administrator can prepare for and limit disruption. While it might sound like planned maintenance, it is not. The key difference is that these events are fired for planned maintenance and sometimes non-planned maintenance. For example, if Azure is doing host healing activities and needs to move VMs on a short notice.

These events are consumed programmatically, and will give the following advance notice:

  • Freeze – 15 Minutes
  • Reboot – 15 Minutes
  • Redeploy – 10 Minutes

Disaster Recovery (DR)

Azure can provide a highly cost-effective DR solution for Citrix customers looking to gain immediate value from cloud adoption today. The deployment model topology determines the DR solution implementation.

Extending the Architecture

Under this topology, the management infrastructure remains on-premises, but workloads are deployed to Azure. If the on-premises data center is not reachable, existing connected users remain connected, but new connections will not be possible because the management infrastructure is unavailable.

To protect the management infrastructure, pre-configure Azure Site Recovery to recover the management infrastructure into Azure. This is a manual process and once recovered, your environment can be made operational. This option is not seamless and cannot recover components such as ADC VPX, however for organizations with more a more flexible recovery time objective (RTO) it can reduce the operational costs.

Hosting Architecture

When deploying this topology, the Citrix Management infrastructure is deployed into Azure and treated as a separate site. This provides functional isolation from on-premises deployment in the event of a site failure. Use Citrix ADC and StoreFront to aggregate resources and provide users a near instant failover between Production and Disaster Recovery resources.

The presence of the Citrix Infrastructure in Azure means that no manual processes need to be invoked and no systems need to be restored before users can access their core workspace.

Cloud Services Architecture

When using Citrix Cloud, Azure becomes just another resource location. This topology provides the simplest deployment as the management components are hosted by Citrix as a Service, and Disaster Recovery workloads can be achieved without deploying duplicate infrastructure to support it. The user experience during failover in the event of a disaster can be seamless.

The items in the following table help the customer with their DR planning:

Component Requirement
What are the RTO and RPO requirements of the Citrix environment? RTO - Targeted duration of time and a service level within which a business process must be restored after a disaster. RPO - The interval of time that might pass during a disruption before the quantity of data lost during that period exceeds the Business Continuity Plan's maximum allowable threshold or tolerance.€
What is the desired outcome when a service disruption occurs in the entire region where your Azure virtual machine application is deployed? Review these options in alignment with the customer's RTO and RPO for DR. Disaster Recovery of a Citrix environment in Azure can be addressed with Azure Site Recover, passive Secondary Site, and active Site Azure Site. Recovery only supports Server OS (Citrix infrastructure and Server VDAs). Client OS is not supported (for example persistent desktops created using ARM Templates). Also, Machine Catalogs created by MCS (Server or Client VDA) must be recreated using a Recovery Task.

Resource Groups

Resource Groups (RG) in Azure are a collection of assets in logical groups for easy or even automatic provisioning, monitoring, and access control, and for more effective management of their costs. The benefit of using RGs in Azure is grouping related resources that belong to an application together, as they share a unified lifecycle from creation to usage and finally, de-provisioning.

The key to having a successful design of resource groups is understanding the lifecycle of the resources that are included in them.

Resource Groups are tied to Machine Catalogs at creation time and cannot be added or changed later. To add extra Resource Groups to a Machine Catalog, the Machine Catalog must be removed and recreated.

Image Management

Image management is the process of creating, upgrading, and assigning an image that is consistently applied across development, test, and production environments. Consider the following when developing an image management process:

On-Demand Provisioning

The customer needs to determine if MCS be used to manage the Azure non-persistent machines or create their own Azure Resource Manager (ARM) templates. When a customer uses MCS to create machine catalogs, the Azure on-demand provisioning feature reduces storage costs, provides faster catalog creation and faster virtual machine (VM) power operations. With Azure on-demand provisioning, VMs are created only when Citrix DaaS initiates a power-on action, after the provisioning completes. A VM is visible in the Azure portal only when it is running, while in Citrix Studio, all VMs are visible, regardless of power status. Machines created via ARM templates or MCS can be power managed by Citrix using an Azure host connection in Citrix Studio.

Storage Account Containers

The customer needs to decide the organizational structure for the storing the source (or golden) images from which to create the virtual machines using Citrix Machine Creation Services (MCS). Citrix MCS images can be sourced from snapshots, managed or unmanaged disks and can reside on standard or premium storage. Unmanaged disks are accessed through general-purpose storage accounts and are stored as VHDs within Azure Blob storage containers. Containers are folders which can be used to separate Production, Test, and Development images.

Image Replication

The customer needs to determine the appropriate process for replicating images across regions and how Citrix App Layering technology might be used within the overall image management strategy. PowerShell scripts can be used with Azure Automation to schedule image replication. More information on Citrix App Layering can be found here, but keep in mind that Elastic Layering requires an SMB File share that does not reside on Azure Files. See the File Servers section for supported SMB share technologies that support Elastic Layering.

File Server Technologies

Azure offers several file server technologies that can be used to store Citrix user data, roaming profile information or function as targets for Citrix Layering shares. These options include the following:

  • Standalone File Server
  • File Servers using Storage Replica
  • Scale Out File Server (SOFS) with Storage Spaces Direct (S2D)
  • Distributed File System – Replication (DFS-R)
  • Third-party storage appliances from Azure Marketplace (such as NetApp, and others)

The customer must select file server technologies that best meet their business requirements. The following table outlines some benefits and considerations for each of the different file serving technologies.

Options Benefits Considerations
Standalone File Server Well known and tested. Compatible with existing backup/restore products Single point of failure. No data redundancy. Outage for monthly patching, measured in minutes.
File Servers using Storage Replica Block Level Replication. SMB 3.0. Storage Agnostic (SAN, Cloud, Local, and so on). Offers Synchronous and Asynchronous Replication. Recommended when multi-region access is required Manual failover needed. Uses 2x disk space. Manual failover still has downtime, measured in minutes. DNS dependency.
SOFS on Storage Spaces Direct Highly available. Multi-node and Multi-disk HA. Scale up or scale out. SMB 3.0 and 3.1. Transparent failover during planned and unplanned maintenance activities. Recommended for user profile storage within Azure Uses 2-3x disk space. Third-party back-up software support can be limited by the vendor. Does not support multi-region deployment
Distributed File System – Replication Proven technology for file-based replication. Supports PowerShell Domain-based. Cannot be deployed in an active-active configuration.
Third-party storage applications Deduplication technologies. Better use of storage space. Extra cost. Proprietary management tools.

The recommended file server virtual machine types are generally DS1, DS2, DS3, DS4, or DS5, with the appropriate selection depending on customer use requirements. For best performance, ensure that premium disk support is selected. Extra guidance can be found on Microsoft Azure documentation.

Infrastructure Cost Management

Two technologies are available that can be used to reduce the costs of the Citrix environment in Azure, reserved instances and Citrix Autoscale.

Reserved Instances

Azure Reserved VM Instances (RIs) significantly reduce costs—up to 72 percent compared to pay-as-you-go prices—with one-year or three-year terms on Windows and Linux virtual machines (VMs). When customers combine the cost savings gained from Azure RIs with the added value of the Azure Hybrid Benefit, they can save up to 80 percent. The 80% is calculated based on a three-year Azure Reserved Instance commitment of a Windows Server when compared to the normal pay-as-you-go rate.

While Azure Reserved Instances require making upfront commitments on compute capacity, they also provide flexibility to exchange or cancel reserved instances at any time. A reservation only covers the virtual machine compute costs. It does not reduce any of the additional software, networking, or storage charges. This is good for the Citrix infrastructure and the minimum capacity needed for a use case (on and off hours).

Citrix Autoscale feature supports reserved instances as well to further reduce your costs - you can now use Autoscale for bursting in the cloud. In a delivery group you can tag machines that need to be autoscaled and exclude your reserved instances (or on-premises workloads) - you can find more info here: Restrict Autoscale to certain machines in a Delivery Group.

Citrix Autoscale

Autoscale is a feature exclusive to the Citrix DaaS that provides a consistent, high-performance solution to proactively power manage your machines. It aims to balance costs and user experience. Autoscale incorporates the deprecated Smart Scale technology into the Studio power management solution.

Machine Type Schedule-based Load-based Load and schedule-based
Server OS machines hosting published applications or hosted shared desktops (Server VDI) Supported Supported Supported
Desktop OS machines hosting static persistent (dedicated) VDI desktops Supported. During periods when machines are powered off (for example, after working hours), users can trigger machines to power on through the Citrix Receiver. You can set Autoscale's Power Off Delay so Autoscale does not automatically power machines off before the user can establish a session. Supported only for unassigned machines. Supported only for unassigned machines.
Desktop OS - machines hosting - random non-persistent VDI desktops (pooled VDI desktops) Supported Supported. Use the Session Count scaling metric and set the maximum number of sessions to 1. Supported. Use the Session Count scaling metric and set the minimum number of machines to 1.

reference-architectures_virtual-apps-and-desktops-azure_004.png

Diagram-4: Citrix Autoscale Flow

You can read more about Citrix Autoscale here.

Optimizing End-User Experience

Optimizing the end-user experience includes balancing the end user's perception of responsiveness with the business needs of staying within a budget. This section discusses the design concepts and decisions around providing an environment that is correctly sized for the business and the end user.

Defining the User Workspace

Review the following high-level questions to better understand existing use cases and the resources needed for their end users.

Topic Question
Number of Users How many users are expected within the environment? Did the assessment phase determine the appropriate VDI Model? (Virtual Apps or Virtual Desktops)
Use Cases What types of applications will be consumed by the end users? What are the VDA requirements for the applications? How will the applications be delivered best? (Virtual Apps vs Virtual Desktops)
User Group working hours When will users be accessing the environment? What are the peak hours? What is the expected consumption throughout the day? (The consumption of users during specific hours helps identify workspace requirements for scale automation and Azure reserved Instance purchasing.)
Location Where are the end users located? Deploy workspaces across multiple regions or only in a single region?
User and Application Data Where is the user and application data stored? Will data be contained solely in Azure, only on-premises, or a mix of both? What is the maximum tolerable latency for accessing the user data?

Azure VM Instance Types

Each Citrix component uses an associated virtual machine type in Azure. Each VM series available is mapped to a specific category of workloads (general purpose, compute-optimized, and so forth) with various sizes controlling the resources allocated to the VM (CPU, Memory, IOPS, network, and others).

Most Citrix deployments use the D-Series and F-Series instance types. The D-Series is commonly used for the Citrix infrastructure components and sometimes for the user workloads when they require extra memory beyond what is found in the F-Series instance types. F-Series instance types are the most common in the field for user workloads because of their faster processors which bring with them the perception of responsiveness.

Why D-Series or F-Series? From a Citrix perspective, most infrastructure components (Cloud Connectors, StoreFront, ADC, and so on) use CPU to run core processes. These VM types have a balanced CPU to Memory ratio, are hosted on uniform hardware (unlike the A-Series) for more consistent performance and support premium storage. Certainly, customers adjust their instance types to meet their needs and their budget.

The size and number of components within a customer's infrastructure will always depend on customer's requirements, scale, and workloads. However, with Azure we have the ability to scale dynamically and on-demand! For cost-conscious customers, starting smaller and scaling up is the best approach. Azure VMs require a reboot when changing size so plan these events within scheduled maintenance windows only and under established change control policies.

How about Scale-up or Scale-out?

Review the following high-level questions to better understand a customer's use case and the resources needed for their end users. This also helps them to plan their workload well in advance.

Scaling up is best when the cost per user per hour needs to be the lowest and a larger impact can be tolerated if the instances fail. Scaling out is preferred when the impact of a single instance failure needs to be minimized. The following table provides some example instance types for different Citrix components.

Component Recommended Instance Type
Delivery Controllers, Cloud Connectors Standard DS2_v2 or DS2_v3 with Premium SSD storage
Scale Up Server OS User Workloads Standard_F16s_v2 VMs with Virtual App were identified to have the lowest $/user/hr cost compared to other instances. Standard_DS5_v2 VMs were also cost competitive compared to other instances
Scale Out Server OS User Workloads Standard_F4_v2 and Standard_F8_v2 instances support a lower user count however provide more flexibility of power management operations due to smaller user container sizes. This allows machines to be more effectively deallocated to save costs on Pay-as-You-Go instances. Also, the failure domains are smaller when scaling out.
Desktop OS User Workloads Standard_F2_v2 has the lowest dual-core cost and performs well with Windows 10.

The latest instance type study was done to provide great insight in this area and we highly recommend the read. In all cases, customers evaluate the instance types with their workloads.

For graphic-intensive workloads, consider the NVv4-series virtual machines. They are powered by AMD EPYC 7002 processors and virtualized Radeon MI25 GPU. These virtual machines are optimized and designed for VDI and remote visualization. With partitioned GPUs, NVv4 offers the right size for workloads requiring smaller GPU resources at the most optimal price. Alternative the NVv3 series is optimized and designed for remote visualization, streaming, gaming, encoding, and VDI scenarios using frameworks such as OpenGL and DirectX. These VMs are backed by the NVIDIA Tesla M60 GPU. For further GPU options check the other offerings from Azure.

While scaling up is usually a preferred model to reduce the cost, Autoscale can benefit from smaller instances (15–20 sessions per host). Smaller instances host fewer user sessions than larger instances. Therefore, in the case of smaller instances, Autoscale puts machines into drain state much faster because it takes less time for the last user session to be logged off. As a result, Autoscale powers off smaller instances sooner, thereby reducing costs. You can read more about instance size considerations for Autoscale in the official documentation.

Storage

Just like any other computer, a virtual machine in Azure use disks as a place to store an operating system, applications, and data. All Azure virtual machines have at least two disks – a Windows operating system disk and a temporary disk. The operating system disk is created from an image, and both the operating system disk and the image are stored within Azure as virtual hard disks (VHDs). Virtual machines may also have extra disks attached as data disks, also stored as VHDs.

Azure Disks are designed to deliver enterprise-grade durability. Three performance tiers for storage exist that can be selected when creating disks: Premium SSD Disks, Standard SSD, and Standard HDD Storage, and the disks may be either managed or unmanaged. Managed disks are the default and are not subject to the storage account limitations like the unmanaged disks.

Managed Disks are recommended over the Unmanaged Disks by Microsoft. Consider Unmanaged Disks by exception only. Standard Storage (HDD and SSD) includes transaction costs (storage I/O) that must be considered but have lower costs per disk. Premium Storage has no transaction costs but have higher per disk costs and offers an improved user experience.

The disks offer no SLA unless an Availability Set is used. Availability Sets are not supported with Citrix MCS but should be included with Citrix Cloud Connector, ADC, and StoreFront.

Identity

The section focuses on Identity controls, workspace user planning, and the end-user experience. The primary design consideration is managing identities within both Azure and Citrix Cloud tenants.

Microsoft Azure Active Directory (Azure AD) is an identity and access management cloud solution that provides directory services, identity governance, and application access management. A single Azure AD directory is automatically associated with an Azure subscription when it is created.

Every Azure subscription has a trust relationship with an Azure AD directory to authenticate users, services, and devices. Multiple subscriptions can trust the same Azure AD directory, but a subscription will only trust a single Azure AD directory.

Microsoft's identity solutions span on-premises and cloud-based capabilities, creating a single user identity for authentication and authorization to all resources, regardless of location. This concept is known as Hybrid Identity. There are different design and configuration options for hybrid identity using Microsoft solutions, and in some cases, it might be difficult to determine which combination will best meet the needs of an organization.

Common Identity Design Considerations

Usually extending the customers Active Directory Site to Azure uses the use of Active directory replication to provide identity and authentication with the Citrix Workspace. A common step is to use AD Connect to replicate user to Azure Active Directory which provides you with the subscription-based activation required for Windows 10.

It is recommended to extend local Active Directory Domain Services to the Azure Virtual Network Subnet for full features and extensibility. Azure Role-Based Access Control (RBAC) helps provide fine-grained access management for Azure resources. Too many permissions can expose and account to attackers. Too few permissions mean that employees can't get their work done efficiently. Using RBAC, administrator can give employees the exact permissions they need.

Authentication

Domain Services (either AD DS or Azure AD DS) are required for core Citrix functionality. RBAC is an authorization system built on the Azure Resource Manager that provides fine-grained access management of resources in Azure. RBAC allows you to granularly control the level of access that users have. For example, you can limit a user to only manage virtual networks and another user to manage all resources in a resource group. Azure includes several built-in roles that you can use.

Azure AD Authentication is supported for Citrix Workspace, Citrix DaaS, and Citrix ADC/StoreFront authentication. For full SSON with Azure AD, Citrix Federated Authentication Service (FAS) or Azure AD DS (for core Domain Services) must be used.

Citrix FAS supports single sign-on (SSO) to DaaS in Citrix Workspace. Citrix FAS is typically adopted if you're using one of the following identity providers:

  • Azure Active Directory
  • Okta
  • SAML 2.0
  • Citrix Gateway

Active Directory and Azure Active Directory Outcomes

  • Azure Active Directory Provisioned Tenant
  • List of desired Organizational roles for Azure RBAC with mapping to Built-In or Custom Azure Roles
  • List of desired Admin access levels (Account, Subscription, Resource Group and so on)
  • Procedure to grant access/role to new users for Azure
  • Procedure to assign JIT (just in time) elevation for users for specific tasks

Here is an example architecture of namespace layout and authentication flow.

reference-architectures_virtual-apps-and-desktops-azure_005.png

Diagram-5: Architecture of namespace layout and authentication flow

Citrix Cloud Administration + Azure AD

By default, Citrix Cloud uses the Citrix Identity provider to manage the identity information for all users who access the Citrix Cloud. Customers can change this to use Azure Active Directory (AD) instead. By using Azure AD with Citrix Cloud, Customers can:

  • Use their own Active Directory, so they can control auditing, password policies, and easily disable accounts when needed.
  • Configure multifactor authentication for a higher level of security against the possibility of stolen sign-in credentials.
  • Use a branded sign-in page, so your users know they're signing in at the right place.
  • Use federation to an identity provider of your choice including ADFS, Okta, and Ping, among others.

Citrix Cloud includes an Azure AD app that allows Citrix Cloud to connect with Azure AD without the need for you to be logged in to an active Azure AD session. Citrix Cloud Administrator Login allows Azure AD identities to be used in the customers Citrix Cloud tenant.

  • Determine if Citrix Cloud administrators use their Citrix Identity or Azure AD to access the Citrix Cloud the URL will follow the format https://citrix.cloud.com/go/{Customer Determined}
  • Identify the Authentication URL for Azure AD authentication into Citrix Cloud

Governance

Azure Governance is a collection of concepts and services that are designed to enable management of your various Azure resources at scale. These services provide the ability to organize and structure your subscriptions in a logical way, to create, deploy, and reusable Azure native packages of resources. This subject is focused on establishing the policies, processes, and procedures associated with the planning, architecture, acquisition, deployment, operation, and management of Azure resources.

Citrix Cloud Administrator Login

Determine if Citrix Cloud administrators use their Citrix Identity, Active Directory Identity, or Azure AD to access Citrix Cloud. Azure AD integration enables multifactor authentication into Citrix Cloud for administrators. Identify the Authentication URL for Azure AD authentication into Citrix Cloud. URL follows the format https://citrix.cloud.com/go/{Customer Determined}.

RBAC permissions and delegation

Using Azure AD customers can implement their governance policies using Role-Based Access Control (RBAC) of Azure resources. One of the primary tools for the application of these permissions is the concept of a Resource Group. Think of a Resource Group as a bundle of Azure resources that share lifecycle and administrative ownership.

In the context of a Citrix environment organize these in a way that will allow for proper delegation between teams and promote the concept of least privilege. A good example is when a Citrix Cloud deployment uses a Citrix ADC VPX provisioned from the Azure Marketplace for external access. Although a core piece of Citrix infrastructure, the Citrix ADCs might have a separate update cycle, set of admins, and so on This would call for separating the Citrix ADCs from the other Citrix components into separate Resource Groups so the Azure RBAC permissions can be applied through the administrative zones of tenant, subscription, and resources.

MCS Service Principal

To access resources that are secured by an Azure AD tenant, the entity that requires access must be represented by a security principal. This is true for both users (user principal) and applications (service principal). The security principal defines the access policy and permissions for the user/application in the Azure AD tenant. This enables core features such as authentication of the user/application during sign-in, and authorization during resource access.

Determine the permissions allocated to the Service Principal used by the Citrix MCS service.

Subscription scope service principals have Contributor rights to the applicable subscription used by the Citrix environment. Narrow Scope service principals have granular RBAC applied to the Resource Groups containing the network, Master Images, and VDAs. Narrow Scope Service Principals are recommended to limit the permissions only to the permissions required by the service. This adheres to the security concept of "least privilege".

Tagging

Customer applies tags to their Azure resources giving metadata to logically organize them into a taxonomy. Each tag consists of a name and a value pair. For example, they can apply the name "Environment" and the value "Production" to all the resources in production.

The customer can retrieve all the resources in your subscription with that tag name and value. Tags enable them to retrieve related resources from different resource groups. This approach is helpful when admin need to organize resources for billing or management.

There is a limit of 15 tags per Resource. Citrix MCS creates 2 tags per VM so a customer is limited to 13 tags for MCS machines. MCS non-persistent machines are deleted during reboot. This removes Azure VM-specific characteristics such as tags, boot diagnostics If tags are required, it is recommended to create an Azure Append policy and apply it to the applicable MCS Resource Groups.

Azure Policy

Azure policies can control aspects such as tagging, permitted SKUs, encryption, Azure region, and naming convention. There are default policies available and the capability to enforce custom policies. Azure policies can be applied at the subscription or Resource Group level. Multiple policies can be defined. Policies applied at the Resource Group level take precedence over Subscription Level policy.

Control and standardize all aspects of Azure across the Citrix environment. Hard quota forces the policy and not permits exceptions. Soft quota audits for policy enforcement and notifying if the policy is not met. Refer to the Azure documentation for more detailed information to define the policies.

reference-architectures_virtual-apps-and-desktops-azure_006.png

Diagram-6: Azure Governance Access Policy and RBAC

Security

Security is integrated into every aspect of Azure. Azure offers unique security advantages derived from global security intelligence, sophisticated customer-facing controls, and a secure hardened infrastructure. This powerful combination helps protect applications and data, support compliance efforts, and provide cost-effective security for organizations of all sizes.

Securing storage accounts provisioning by Citrix Virtual Apps and Desktops service

As stated previously, MCS is the service (within Citrix Virtual Apps and Desktops) responsible for spinning up machines in the customer subscription. MCS uses uses an AAD identity – Application service principal for access to Azure resource groups to perform different actions.
For storage account type of resources, MCS requires the listkeys permission to acquire the key when needed for different actions (write/read/delete).
Per our current implementation, an MCS requirement for:

  • Storage account network is access from the public internet.
  • Storage account RBAC is listkeys permission

For some organizations keeping the Storage account endpoint public is a concern. Here is an analysis of the assets created and stored when deploying VMs with managed disk (the default behavior).

  • Table Storage: We maintain machine configuration and state data in table storage in the primary storage account (or a secondary one, if the primary one is being used for Premium disks) for the catalog. There is no sensitive information within the tables.
  • Locks: For certain operations (allocating machines to storage accounts, replicating disks), we use a lock object to synchronize operations from multiple plug-in instances. Those files are empty blobs and include no sensitive data.

For machine catalogs created before Oct 15 2020, MCS creates an additional storage account for identity disks:

  • Disk Import: When importing disks (identity, instruction), we upload the disk as a page blob. We then create a managed disk from the page blob and delete the page blob. The transient data does include sensitive data for computer object names and password. This does not apply for all machine catalogs created post Oct 15 2020.

Using a narrow Scope Service Principal applied to the specific resource groups is recommended to limit the permissions only to the permissions required by the service. This adheres to the security concept of "least privilege". Refer to CTX219243 and CTX224110 for more details.

IaaS - Azure Security Center Monitoring

Azure Security Center analyzes the security state of Azure resources. When the Security Center identifies potential security vulnerabilities, it creates recommendations that guide the customer through the process of configuring the needed controls. Recommendations apply to Azure resource types: virtual machines (VMs) and computers, applications, networking, SQL, and Identity and Access. There are a few best practices that you have to follow:

  • Control VM access and Secure privileged access.
  • Provisioning antimalware to help identify and remove malicious software.
  • Integrate your antimalware solution with the Security Center to monitor the status of your protection.
  • Keep your VMs current and ensure at deployment that the images you built include the most recent round of Windows and security updates.
  • Periodically redeploy your VMs to force a fresh version of the OS.
  • Configuring network security groups and rules to control traffic to virtual machines.
  • Provisioning web application firewalls to help defend against attacks that target your web applications.
  • Addressing OS configurations that do not match the recommended baselines.

Network Design

Network security can be defined as the process of protecting resources from unauthorized access or attack by applying controls to network traffic. The goal is to ensure that only legitimate traffic is allowed. Azure includes a robust networking infrastructure to support your application and service connectivity requirements. Network connectivity is possible between resources located in Azure, between on-premises and Azure hosted resources, and to and from the internet and Azure.

Virtual Network (VNet) Segmentation

Azure virtual networks are similar to a LAN on your on-premises network. The idea behind an Azure virtual network is that you create a single private IP address space–based network on which customers can place all their Azure virtual machines. The best practice is to segment the larger address space into subnets and create network access controls between subnets. Routing between subnets happens automatically, and you don't need to manually configure routing tables.

Use a Network Security Group (NSG). NSGs are simple, stateful packet inspection devices that use the 5-tuple (the source IP, source port, destination IP, destination port, and layer 4 protocol) approach to create allow/deny rules for network traffic. Rules allow or deny traffic to and from a single IP address, to and from multiple IP addresses, or to and from entire subnets.

Customers can create custom, or user-defined, routes called User-defined Routes (UDRs) in Azure to override Azure's default system routes, or to add extra routes to a subnet's route table. In Azure, admins can create a route table, then associate the route table to zero or more virtual network subnets. Each subnet can have zero or one route table associated to it.

NSGs and UDRs are applied at the subnet-level within a Virtual Network. When designing a Citrix Virtual Network in Azure it is recommended to design the virtual network with this in mind, creating subnets for similar components, allowing for the granular application of NSGs and UDRs as needed. An example of this would be segmenting the Citrix infrastructure into its own subnet, with a corresponding subnet for each use case.

Identify the ports and protocols required for Citrix and the supporting technologies. Review to verify these ports are allowed within the Network Security Groups used in the environment. Network Security Groups can limit inbound and outbound communications to a defined set of IP, Virtual Networks, Service Tags, or Application Security Groups.

reference-architectures_virtual-apps-and-desktops-azure_007.png

Diagram-7: Azure Security Center and Network Security using NSG and ASG

Connectivity

Connecting Azure virtual networks with customers local / cloud network is referred to as hybrid networking. This section explains the options for network connectivity and network service routing. Customers can connect their on-premises computers and networks to a virtual network using any combination of the following options:

  • Point-to-site virtual private network (VPN): Established between a virtual network and a single computer in a customer network. Each computer that wants to establish connectivity with a virtual network must configure its connection. This connection type is great for just getting started with Azure, or for developers, because it requires little or no changes to the customer's existing network. The communication between your computer and a virtual network is sent through an encrypted tunnel over the internet.
  • Site-to-site VPN: Established between an on-premises VPN device and an Azure VPN Gateway that is deployed in a virtual network. This connection type enables any on-premises resource that the customer authorizes to access a virtual network. The communication between an on-premises VPN device and an Azure VPN gateway is sent through an encrypted tunnel over the internet.
  • Azure ExpressRoute: Established between the customer's network and Azure, through an ExpressRoute partner. This connection is private. Traffic does not go over the internet.

The primary considerations for Azure to Customer connectivity are bandwidth, latency, security, and cost. Site to Site VPNs have lower bandwidth limits than Express Route and are dependent on the performance of the edge router used by the customer. SLAs are available on the VPN Gateway SKUs. Site to Site VPNs use IPSEC over the internet.

Express Routes are dedicated private connections and not over the internet. This results in lower latency when using Express Route. Also Express Route can scale up to 10 Gbps. Express Route is configured using a certified partner. Consider the configuration time by these providers during project planning. Express Route costs have a Microsoft component and an Express Route provider component.

Typically these connections are shared across multiple services (database replication, domain traffic, application traffic, and so on) In a hybrid cloud deployment there may be scenarios where internal users require their ICA traffic to go through this connection to get to their Citrix apps in Azure, therefore monitoring its bandwidth is critical.

With ADC and traditional StoreFront optimal gateway routing may also be used to direct a user's connection to an ADC using an office's ISP rather than the Express Route or VPN to Azure.

User-Defined Routes (UDRs)

Typically customers use a UDR to route Azure traffic to a firewall appliance within Azure or a specific virtual network. For example, North/South traffic from a VDA to the internet. If large amounts of traffic are routed to third party firewall appliances within Azure this can create a resource bottleneck or availability risk if these appliances are not sized or configured appropriately. Use NSGs to supplement third-party firewalls as much as possible where appropriate. Consider Azure Network Watcher if traffic introspection is required.

Virtual network peering

Virtual network peering seamlessly connects two Azure virtual networks. Once peered, the virtual networks appear as one, for connectivity purposes. The traffic between virtual machines in the peered virtual networks is routed through the Microsoft backbone infrastructure like traffic is routed between virtual machines in the same virtual network, through private IP addresses only.

Azure supports:

  • VNet peering - connecting VNets within the same Azure region
  • Global VNet peering - connecting VNets across Azure regions

Consider using VNet peering to enable the communication between VMs between VNets for customers deploying workloads on multiple VNets.

reference-architectures_virtual-apps-and-desktops-azure_008.png

Diagram-8: Data center Connectivity and Routes

Citrix ADC

Citrix ADC on Microsoft Azure ensures that organizations have access to secure and optimized applications and assets deployed in the cloud and provides the flexibility to establish a networking foundation that adjusts to the changing needs of an environment. In the event of a data center failure, Citrix ADC automatically redirects user traffic to a secondary site, with no interruptions for users. Load balancing and global server load balancing across several data centers further ensures optimum server health, capacity, and utilization.

Discuss with the customer and define the following use cases for each Resource Location:

Access Method Considerations
Internal only A Citrix ADC is not required if only internal access is needed.
External access via Citrix ADC Gateway Service. The Citrix Cloud ADC Gateway Service provides ICA Proxy (secure remote connectivity only).
External access via Citrix ADC VPX deployed in Azure Resource Location A customer needs to consider a Citrix ADC VPX appliance in Azure if they require the following: 1. Multifactor authentication with full SSON 2. Endpoint scanning 3. Advanced authentication or pre-authentication policies 4. Citrix SmartAccess policies. Note: These requirements prompt the need for authentication to occur at the Citrix ADC rather than the Workspace Experience service. StoreFront is required if authentication is managed by a Citrix ADC Gateway virtual server.

Citrix ADC - Deployment Model

Active-Active deployments use standalone Citrix ADC nodes that can be scaled out using the Azure Load Balancer. Active-Passive pairs facilitate stateful failover of ICA traffic in the event of a node failure however they are limited to the capacity of a single VPX. Active-Passive nodes also require Azure Load Balancer.

Multiple NICs are recommended to isolate the SNIP, NSIP, and VIP traffic to maximize the throughput available for Citrix ADC Gateway or other services.

Monitoring DaaS Solution in Azure

Monitoring your Citrix deployment provides you with several benefits: increased performance, higher availability, lower cost, enhanced security, and satisfied users. Both Microsoft and Citrix provide a set of core tools and services to assist with monitoring the environment. This paper contains an overview of the available tools along with recommendations for areas to monitor targeted for Citrix deployments in Azure.

The Microsoft tools and services include the following: Azure Monitor, Azure Advisor, Azure Service Health, Microsoft Sentinel, Azure Network Watcher, and Azure Spend. The Citrix tools and services include the following: Citrix Monitor, Citrix Analytics, and Citrix Managed Services. Some of these services do incur extra charges, but most of them are included with your subscription.

This document provides a list of the recommended tools and a section where we identify baseline values of key elements to monitor. We also recommend configurations for the values to help you successfully deploy Citrix in Azure.

Microsoft

This section covers the Microsoft Azure tools and services that can be used to monitor your Citrix Virtual Apps and Desktops service deployment in Azure.

Azure Monitor

For a Citrix deployment in Azure, Azure Monitor is the best place to start. Azure Monitor helps you improve both the performance and availability of your Citrix deployment. Azure Monitor collects and analyzes the telemetry received from both your on-premises and Azure environments. Using Azure Monitor allows for proactive responses to issues with resources before users need to open a ticket with your help desk. Azure Monitor consists of six different services that can be used with one another to manage your Citrix resources:

  • Metrics: A collection of numerical values that represent a particular aspect of an Azure resource at a point in time.

  • Alerts: A collection of conditions being monitored and acting as triggers to initiate an associated action when the condition occurs.

  • Logs: A collection of data written to logs and available for analysis through Azure Metrics.

  • Dashboards: A customizable view of information available on monitored resources.

  • Application Insights: A service that monitors your web applications and supports performance optimizations and troubleshooting.

Metrics

Azure Metrics is the single most powerful tool available in Azure Monitor for tracking the health of your Citrix resources. The term “metrics” represents information about a particular aspect of a resource that is distilled to a numerical value. Metrics are tracked over time and reported on at a specific interval. For instance, the number of active sessions on a Citrix VDA host is collected every 30 seconds and displayed in a real-time chart.

Azure Metrics allows for the tracking and alerting of metrics for each of your Citrix resources. Azure Metrics provides metrics for the Citrix virtual machines (VMs) and the underlying virtual machine host. Azure Metrics also has the ability to add diagnostic extensions to gather metrics from the guest operating system. Metrics are provided in near real-time and can be viewed through the Metrics Explorer charts. Metrics Explorer charts can compare metrics from different resources and saved to Dashboards for monitoring the environment.

To monitor Citrix virtual machine resources in Azure, be sure to enable the Guest OS Metrics through the Diagnostic Settings for the virtual machine. This setting automatically does the following:

  • Enables performance counters for CPU, Memory, Disk, and Network at one-minute intervals.

  • Enables event log entry collection (Warning level and above).

  • Provides the option to collect Custom performance counters and event logs.

Guest OS metrics are retained for 93 days when sent to Azure Monitor Metrics.

The following extra settings are recommended for Citrix deployments in Azure:

  • Enable the Sinks > Azure Monitor > Send diagnostic data to Azure Monitor setting. This setting allows the use of Custom counters to collect multi-dimensional metrics and enables alerting on the Guest OS metrics.

  • Enable Crash dump settings when troubleshooting an issue with Citrix or Microsoft Support. This setting places the dump files directly in a storage container where you can easily retrieve them.

Collecting metrics is a powerful way to track the health and performance of your Citrix resources. Azure Metrics can track and alert on any metric that is available as a Windows performance monitor counter. Metrics are the basis for orchestration which uses rules to automate actions within Azure.

Alerts

The primary purpose for monitoring your Citrix infrastructure in Azure is to can proactively respond to issues before the users are adversely affected. Alerts notify you or take automated action on a condition that needs to be handled quickly. Although not all disruptions provide warning signs, the diligent use of alerts can prevent most common scenarios.

Conditions for an alert can be based on a set of predefined signals that Azure provides or upon Guest OS metrics. These signals include metric values (the most common), log search results, Azure Activity log events, or even the health of the Azure platform. You need to set the alerts at a level that provides advanced notice of a potential issue while minimizing the frequency alerts that require action. An alert rule is a condition that must be met for the alert to fire off when enabled. The alert rule can then run a set of actions defined in an Action Group. The available actions include the following:

  • Notifications by email, SMS, Push, or Voice
  • Triggering of an Automation Runbook, Azure Function, Logic App, Event Hub, or Webhook
  • Creation of an ITSM Ticket

Alerts can be scoped to a particular resource group, region, or resource type. When configuring alerts for multiple targets, only a single condition can be specified and the targets must all support that condition. For metrics-based conditions, the alert rule definition includes the severity level along with the ability to resolve the alert automatically. Once fired, alerts need to be acknowledged when automated responses are not employed to handle the alert condition. Alerts do entail a monthly cost and Azure displays the estimated cost for acceptance when the alert rule is created.

Logs

Sometimes, metrics are not available for a particular event that you want to monitor for within your Citrix deployment. When metrics are not available, logs can be monitored for entries that indicate the event has occurred. Azure Monitor Logs can accept logs from Azure Services, virtual machine agents, or from applications using Application Insights. A Log Analytics workspace is required where the log data can be stored for analysis. These logs can then be aggregated and queried for key entries that indicate conditions which need to be managed. The query results can be viewed through either a dashboard or a workbook.

Azure Monitor Metrics is limited to numerical data only. Azure Monitor Logs can store and analyze different data types, which provide an advantage in some situations. The log analysis requires the use of a query which must be created and maintained. The queries are written in the Kusto Query Language (KQL), which is the same language used by Azure Data Explorer.

Dashboards

Dashboards represent a visual way to monitor your Citrix environment daily. Dashboards consist of tiles that come from any number of gallery selections. The possible tiles include metrics charts, security charts, user information, automation, or a direct link to any resource or resource group. Custom dashboards can be created that focus on a particular role or set of resources. Each dashboard can be shared or private and each portal user can have up to 100 private dashboards and an unlimited number of shared dashboards.

Application Insights

If you have web applications that are hosted in Azure and delivered via Citrix, use Application Insights to monitor your applications that are coded on popular web platforms. Application Insights can integrate with your DevOps process using a software development kit (SDK) or the Application Insights Agent. Application Insights then combines the telemetry provided with performance counters and other diagnostic information. These insights can help with diagnosing issues and provide a deeper understanding of how users interact with your application.

Application Insights delivers the information collected to Azure Monitor. You can use Microsoft PowerBi or similar tools to analyze the raw data stored in Azure Monitor. Some of the areas that can be monitored with Insights include the following:

  • What pages are most popular and what time of day they load.

  • What pages are failing to load to help you diagnose resource issues.

  • Load performance for your web application from the perspective of the user’s browser.

  • Any exceptions that occur, whether caused by the server or browser code.

  • Any custom events or metrics that you choose to instrument with the Insights SDK.

The Application Insights console lets you manage the performance of your web applications on Citrix to provide a better end-user experience.

Azure Advisor

Azure Advisor is a service that analyzes your resource configurations in the background and makes recommendations to help improve your Azure Deployment. These recommendations are grouped into five categories: Cost, Security, Reliability, Operational Excellence, and Performance. The Security category comes from the Microsoft Defender for Cloud. For each category, the Advisor lists the resources affected and provides guidance on how to improve the resource configuration. You can filter the recommendations by resource type and subscription.

Azure Advisor supports the configuration of Alerts to monitor for situations where your Azure environment falls outside the best practices recommendations. See the [Azure Advisor Alerts](#Azure Advisor Alerts) section later in this document for recommendations.

Microsoft Defender for Cloud

Defender for Cloud is a service that combines functionality previously found in Azure Security Center and Azure Defender. This service continuously assesses your Azure resources and provides and overall score that indicates the security posture of your deployments. Azure Advisor’s Security recommendations are directly from Defender for Cloud. Defender for Cloud also provides direct guidance on how to resolve any issues the service identifies. The recommendations come from the Azure Security Benchmark, an Azure-specific set of guidelines authored by Microsoft.

Defender for Cloud with enhanced security features can be deployed in a hybrid configuration to support on-premises deployments along with other cloud providers.

For Citrix deployments, enabling Defender for Cloud provides the following features that secure your Citrix resources:

  • Risk assessment for resources being accessed from the internet, such as source IP address and frequency.

  • Just-in-time (JIT) VM access that limits when ports are open for initial inbound connections. Microsoft recommends JIT for all jump box or bastion host connections.

  • Adaptive network hardening (ANH) which further hardens the Network Security Group (NSG) rules. ANH uses machine learning algorithms, trusted configurations, threat intelligence, and other factors to provide recommendations.

  • Fileless attack detection which periodically scans a running machine’s memory to look for malicious payloads running in memory to avoid disk-based detection software.

  • Integration with Microsoft Sentinel.

Microsoft Sentinel

Microsoft Sentinel is a both a Security Information and Event Management (SIEM) and Security Orchestration, Automation, and Response (SOAR) system. Sentinel was designed and built as a cloud-native service. Using sophisticated artificial intelligence, Sentinel continuously monitors all content sources and hunts for suspicious activity.
Sentinel provides a central location for collecting and monitoring data at scale through agents and data connectors. Security incidents are tracked through triggered alerts and automated responses to common tasks. Sentinel can operate across multiple clouds and with your on-premises infrastructure, making it ideal for hybrid Citrix environments.

The Content hub provides a simple interface to enable out-of-the-box pre-packaged solutions for Sentinel. These packages contain Analytics Rules, Hunting Queries, Playbooks, Data Connectors, and Workbooks that are specific to their topics. The following Content hubs are recommended for your Citrix deployment in Azure:

  • Azure Firewall to help increase the security of the networking communication.

  • Cybersecurity Maturity Model Certification CMMC to meet CyberSecurity compliance guidelines within your environment.

  • Microsoft Sentinel Deception to protect against all threats.

  • Microsoft Insider Risk Management to help protect against insider threats.

  • Threat Analysis Response to manage and correlate threat activity.

Data Connectors provide a way to interface Sentinel with other Azure services and third-party systems. The connectors provide the data that is analyzed by Sentinel for potential threats. The following Data Connectors are recommended for your Citrix deployment in Azure:

  • Azure Active Directory for information about user identities, sign-ins, provisioning, and so on

  • Azure Active Directory Identity Protection for security alerts with identities.

  • Azure Activity for any Azure resource activity.

  • Azure DDoS Protection for information on Distributed Denial of Service attacks through flow logs and DDoS notifications.

  • Azure Firewall for information on firewall activity, network rules and DNS proxies.

  • Azure Key Vault for information on Azure Key Vault activity.

  • Azure Storage Account for information on Azure storage account activity for blogs, queues, tables, files, and resource access.

  • Citrix Analytics for information gathered by Citrix Analytics (see the Citrix Analytics section).

  • Citrix Web App Firewall for Citrix firewall activity.

  • Microsoft Defender for Cloud for security alerts originating from Defender.

  • Microsoft Office 365 for any Office activity, assuming your Office 365 tenant is the same tenant as used for your Citrix deployment.

  • Threat Intelligence – TAXII for identifying and remediating potential threats.

  • Windows Firewall for events generated by Windows Firewall service running on Citrix servers.

  • Windows Security Events via Azure Monitor Agent (AMA) for events from the Windows Security event log-on Citrix servers.

Microsoft Sentinel supports data connectors from a wide variety of vendors. These vendors included security, networking, and application vendors. Consider reviewing the available data connectors at least annually to keep Sentinel effective as possible.

Azure Service Health

Azure Service Health provides an easy way to monitor the Azure infrastructure that is hosting your Citrix deployment. Service Health lets you monitor service issues, view upcoming planned maintenance, and track Health and Security advisories. You can filter the active issues and planned maintenance by subscription, region, and service. Any issues with widespread impact are displayed under the Service Issues blade.

With health alerts, you can monitor the health of your own Azure resources. Use health alerts to configure automated notification of service outages or planned maintenance that affect your resources. See the Azure Service Health alerts section later in this document for recommendations.

If you have other services that you use frequently, we recommend subscribing to those services as well. If you set up your alerts correctly, you receive notification of any outages when they happen and planned maintenance does not catch you off guard.

Azure Network Watcher Traffic Analytics

While Citrix is built to be secure by design, users are still a weak link and login credentials can be compromised. When running Citrix in Azure, one of the best ways to secure access to your applications and data is by monitoring the network traffic. Traffic Analytics is designed to provide you relevant information by analyzing the network traffic flows. By combining raw flow logs with a knowledge of the network topology, Traffic Analytics can provide a comprehensive view of the network communication. The reports include the most active hosts or host pairs, top protocols in use, blocked traffic, open ports, rogue networks, and traffic distribution.

To use Traffic Analytics, your Citrix resources need to be in a region that supports both Network Security Groups (NSGs) and Log Analytics Workspaces. You also need to enable Network Watcher in the same region. For each network security group that includes Citrix resources, create an NSG flow log and enable both Flow Logs Version 2 and Traffic Analytics when creating it. For regulatory compliance, be sure that your Log Analytics Workspace is in the same country as where the NSG flow logs are generated.

NOTE: At a minimum, create NSG flow logs for your Citrix Cloud Connectors, Delivery Controllers, ADC appliances, and StoreFront servers.

Use Traffic Analytics to identify malicious traffic, host spots and busy hosts. Always remember that clients are going to a specific set of hosts, so sometimes normal traffic may appear in the “Frequent conversation” list. The geo-map can be used to visualize the communication sources and quickly identify unexpected and possibly malicious traffic sources. Reviewing the traffic flow patterns, open ports, and blocked traffic can provide you insights into potential threats or unprotected attack vectors.

Azure Cost Management

Azure Cost Management and Billing allows you to configure alerts to warn you when your cost limits have been reached. Spend alerts are the best way to manage your Citrix resources. For large enterprises, enabling budget, credit, and quota alerts help you identify any potential misconfiguration or misuse of Azure resources.

  • Budget Alerts: An alert is sent when either the usage or dollar amount reaches a predefined limit based on a previously established budget.

  • Credit Alerts: The system generates credit alerts automatically when 90% and 100% of your prepayment (monetary commitment) is achieved.

  • Department Spending Quota Alerts: Quota alerts are configured only through the Enterprise Agreement (EA) portal. When triggered, the portal sends an email to department owners when their spend reaches a defined percentage.

Creating a monthly budget with spend alerts provides you advance notice when resources are unexpectedly provisioned. Common reasons for unexpected spend include automation errors, autoscaling misconfiguration, or even malicious intent by trusted insiders. The sooner you are alerted to the additional cost the sooner you can resolve the issue.

Baseline Metrics and Alerts for Azure

The key to a good monitoring environment is knowing what is important to monitor and which items require immediate attention. You don’t want to monitor every available metric because you end up storing information that is not useful. Information collection and storage have costs associated with it, so use it wisely. Here we provide a baseline of Metrics/Counters to monitor and suggest alerts that can give you a starting point to monitor your Citrix environment. You can build on this baseline and include other performance counters or events that you feel are helpful for your environment.

Metrics and Alert Thresholds

For a Citrix deployment, we are going to focus primarily on the Guest OS metrics of Citrix virtual machines. Poor server performance metrics typically indicate that the users are about to experience unpleasant issues, if they are not already. For instance, when the Max Input Delay for a user’s session reaches a predefined delay we know that users are experiencing latency. You can configure the Action group to send an email to the Citrix administrators alerting them to the server's issues. By setting the notification alert to fire off when the Max Input Delay approaches a value known to be unacceptable, admins can intervene proactively.

We have provided the performance counters to monitor along with suggested thresholds for alerting on those counters when used in a Citrix deployment. The suggested alert thresholds are likely to provide advanced notice of user dissatisfaction. Adjust the values and time periods to meet your business needs:

All Citrix Servers

Here is the list of perfmon counters to monitor for all Citrix servers in the deployment:

  • Processor\%Processor time

    • This counter is the amount of time a processor is not idle.

    • Alert when the average is greater than 80% for a sustained 15 minutes.

    • Determine the processes that are consuming the most CPU and identify the cause of the high CPU usage using Task Manager or Citrix Monitor.

    • If all processes are consuming an expected level of CPU time, then it is time to increase capacity for the server or the Delivery Group.

  • System\Processor queue length

    • This counter is the number of threads in a processor queue waiting to be processed.

    • Alert when greater than 5* [number of cores] over a 5-minute interval.

    • Determine which processes are consuming the most CPU and identify the cause of the CPU usage using Task Manager or Citrix Monitor.

    • If all processes are consuming an expected level of CPU time, then it is time to increase capacity for the server or the Delivery Group.

  • Memory\Available Bytes

    • This counter is the amount of memory not allocated to processes or cache.

    • Alert when the available amount of RAM is under 20% of the total RAM over a 5-minute interval.

    • Determine which processes are consuming the memory using Task Manager or Citrix Monitor. Identify any configuration changes that can reduce that level of RAM consumption. Use this metric with the Memory Pages/sec and Paging File %usage counters.

    • If all processes are consuming the expected amount of memory, then it is time to increase capacity for the server or the Delivery Group.

  • Memory\Pages/sec

    • This counter is the number of pages per second that are swapped from disk to running memory.

    • Alert when the pages per second are consistently over 10.

    • Look for applications that are causing the page swaps using Task Manager. Investigate possible alternative configurations. Use this metric with the Memory Available Bytes and Paging Files\%usage counters.

    • If possible, increase the amount of RAM available to the host. If that is not an option, attempt to isolate the application to a set of dedicated servers.

  • Paging File\%usage

    • This counter is the percentage of the current page file that is in use.

    • Alert when the page file usage is greater than 80% for 60 minutes.

    • Look for applications that are causing the page file usage using Task Manager. Investigate possible alternative configurations. Use this metric with the Memory Available Bytes and Memory Pages/sec counters.

    • If possible, increase the amount of RAM available to the host.

  • LogicalDisk\%Disk Time (_total)

    • This counter represents the amount of time the Logical disk is not idle.

    • Alert when the % disk time is greater than 90% for 15 minutes.

    • Look for applications that are causing the high disk usage using Task Manager or Citrix Monitor. Investigate what might be causing the high disk utilization. Use this metric with other logical and physical disk metrics.

    • If all activity looks normal, look for a way to move the applications to disks with higher performing disk subsystems.

  • LogicalDisk\Current disk queue length

    • This counter represents the number transactions waiting for the logical disk to process them.

    • Alert when the current disk queue is greater than 3 for 15 minutes.

    • Look for applications that are causing the high disk usage using Task Manager or Citrix Monitor. Investigate what might be causing the high disk utilization. Use this metric with other logical and physical disk metrics.

    • If all activity looks normal, look for a way to move the applications to disks with higher performing disk subsystems.

  • PhysicalDisk\%Disk Time (_total)

    • This counter represents the amount of time the Physical disk is not idle.

    • Alert when the % disk time is greater than 90% for 15 minutes.

    • Look for applications that are causing the high disk usage using Task Manager or Citrix Monitor. Investigate what might be causing the high disk utilization. Use this metric with other logical and physical disk metrics.

    • If all activity looks normal, look for a way to move the applications to disks with higher performing disk subsystems.

  • PhysicalDisk\Current disk queue length

    • This counter represents the number transactions waiting for the physical disk to process them.

    • Alert when the current disk queue is greater than 3 for 15 minutes.

    • Look for applications that are causing the high disk usage using Task Manager or Citrix Monitor. Investigate what might be causing the high disk utilization. Use this metric with other logical and physical disk metrics.

    • If all activity looks normal, look for a way to move the applications to disks with higher performing disk subsystems.

  • Network Interface\Bytes Total/sec

    • This counter shows the rate at which the network adaptor is processing data packets for the network.

    • Alert when Bytes Total per second are greater than 80% of the NIC’s speed for 5 minutes.

    • Look for applications that are causing the high network usage using Task Manager to investigate what might be causing the high disk utilization. Use this metric with other logical and physical disk metrics.

    • If all activity looks normal, look for a way to increase the network bandwidth or increase capacity to the Delivery Group.

  • User Input Delay per Session\Max Input Delay

    • This metric provides the maximum input delay for the session in milliseconds. The metric measures the time between when the user provides mouse or keyboard input and their input is processed by the system.

    • Alert when a session’s input delay is greater than 1000 ms for 2 minutes.

    • Look for applications that are causing high CPU, disk, or network using using the Task Manager or Citrix Monitor.

    • If activity looks normal, the best approach is to increase capacity to the Delivery Group.

Cloud Connectors

In addition to the counters for all Citrix Servers, enable the following custom performance counters for your Citrix Cloud Connectors. These counters monitor for key failures in the Cloud Connectors:

  • Citrix High Availability Service\Database Transaction Errors/sec

    • This metric represents the number of database transaction failures per second.

    • Ideally, this number is 0.

    • Alert when the counter is greater than 0.

  • Citrix High Availability Service\Failed Leased Enumerations

    • This metric represents the number of failed enumerations for clients.

    • Ideally, this number is 0.

    • Alert when the counter is greater than 0.

  • Citrix High Availability Service\Failed Leased Launches

    • This metric represents the number of failed launches for clients.

    • Ideally, this number is 0.

    • Alert when the counter is greater than 0.

  • Citrix High Availability Service\Registration Rejects/sec

    • This metric represents the number of registrations rejected per second.

    • Ideally, this number is 0.

    • Alert when the counter is greater than 0.

Citrix Virtual Delivery Agent Virtual Machines

In addition to the counters for all Citrix Servers, enable the following custom performance counters for your Citrix Virtual Delivery Agent hosts. These counters monitor for key failures:

  • ICA Session\Latency - Session Average

    • This metric provides the average ICA latency for a user session in milliseconds.

    • Use this metric to monitor the user experience, the value should be under 150 ms for a good user experience and anything over 300 ms is considered degraded.

    • If you are seeing high latency values, look into enabling Adaptive Transport to help mitigate the effects of the latency.

  • User Input Delay per Session\Max Input Delay

    • This metric provides the maximum input delay for the session (in milliseconds). The metric measures the time between when the user provides mouse or keyboard input and their input is processed by the system.

    • Use this metric to monitor the user experience, the value should be under 500 ms, with under 150 ms being considered good and anything over 1000 ms considered unacceptable.

  • Terminal Services\Active Sessions

    • This metric provides the number of active sessions on the Citrix VDA host.

    • Monitor this metric for multi-session hosts.

    • Use this metric to correlate with other metrics by showing active user counts on the graph.

  • CitrixPrinting\Total Jobs Failed

    • This metric represents the total number of print jobs that failed on the Citrix VDA host and should be low.

    • Monitor this metric to see the number of print jobs that are failing on the Citrix hosts.

    • Excessive failed print jobs can point to issues with the Printer Drivers installed on the Citrix host.

Enable the following custom performance counters for your Citrix Virtual Delivery Agent VMs that are running Citrix Profile Management:

  • CitrixProfileManagement\Logon Duration

    • This metric represents the total time in seconds for the user logon event to complete.

    • Monitor this metric to understand the user logon experience. This metric includes the time takes to load the user profile down to the user’s session.

  • CitrixProfileManagement\Logoff Duration

    • This metric represents the total time in seconds for the user logoff event to complete.

    • Monitor this counter to track how long the user logoff event is taking. This metric includes the time it takes for the users data to be written back to the profile location.

  • CitrixProfileManagement\Processed Logoff Files-Above 5MB

    • This metric represents the number of files greater than 5MB that are uploaded to the user profile store during logoff.

    • Monitor this metric to determine if enabling Large File Handling or folder redirection can improve the user logon experience.

  • CitrixProfileManagement\Processed Logon Files-Above 5MB

    • This metric represents the number of files greater than 5MB that are copied down from the user profile storage during logon.

    • Monitor this metric to determine if you must enable profile streaming or Large File Handling to reduce logon times.

Enable Application Log collection on your Citrix Virtual Delivery Agent VMs. Set the following configurations as a baseline:

  • Alert on any RDP Licensing Errors.

  • Alert on these Security Warnings.

    • Event ID 4625: An account failed to log on.

    • Event ID 4771: Kerberos pre-authentication failed.

  • Alert on these Citrix Warning or Error messages.

    • Event ID 1001: The Citrix Desktop Service failed to obtain a list of delivery controllers with which to register.

    • Event ID 1017: The Citrix Desktop Service failed to register with any delivery controller.

    • Event ID 1022: The Citrix Desktop Service failed to register with any controllers in the last 5 minutes.

    • Event ID 6013: System uptime, use to find Citrix servers that are not getting rebooted after patching.

Citrix StoreFront Servers

In addition to the counters for all Citrix Servers, enable the following custom performance counters for your Citrix StoreFront servers. The counters monitor for poor performance:

  • ASP.NET\Request Queued

    • The number of requests ASP has in the queue waiting to be processed.

    • Alert when the values are significantly outside the baseline norms. Establish baselines based on the environment.

  • ASP.NET\Requests Rejected

    • The number of requests rejected because the request queue is full.

    • Alert when the number of rejected requests is greater than one.

Citrix Federated Authentication Service (FAS) Servers

In addition to the counters for all Citrix Servers, enable the following custom performance counters for your Citrix Federated Authentication Service hosts. These metrics monitor for performance-related issues:

  • Citrix Federated Authentication Service\High Load Level

    • This metric tracks the number of certificate signing requests per minute that the Federated Authentication Service accepts.

    • Track this metric because once the High Load level is met, desktops, and applications fail to launch.

Azure ExpressRoute Metrics

If you have an ExpressRoute connection to an on-premises data center or to a peered network, monitor that connection. You need to understand your bandwidth needs and to know how much billable egress traffic is leaving Azure. The key metrics to watch are as follows:

  • ExpressRoute circuit\BitsInPerSecond

    • This metric is the number of bits coming into Azure per second. This data is free.

    • Use this metric for ExpressRoute capacity planning.

    • Alert on this metric when it reaches 80% of your available circuit ingress bandwidth.

  • ExpressRoute circuit\BitsOutPerSecond

    • This metric is the number of bits leaving Azure per second. This data is billable.

    • Use this metric for ExpressRoute capacity planning and for budgeting for data egress.

    • Alert on this metric when it reaches 80% of your available circuit egress bandwidth.

  • ExpressRoute circuit\GlobalReachBitsInPerSecond

    • This metric is the number of bits coming into to Azure per second to peered ExpressRoute circuits (this data is free).

    • Use this metric for ExpressRoute capacity planning.

    • Alert on this metric when it reaches 80% of your available circuit ingress bandwidth.

  • ExpressRoute circuit\GlobalReachBitsOutPerSecond

    • This metric is the number of bits leaving Azure per second to peered ExpressRoute circuits (this data is billable).

    • Use this metric for ExpressRoute capacity planning and for budgeting for data egress.

    • Alert on this metric when it reaches 80% of your available circuit egress bandwidth.

  • ExpressRoute Gateway Connection\BitsInPerSecond

    • This metric is the number of bits coming into to Azure per second for a specific connection to an ExpressRoute circuit (this data is free).

    • Alert on this metric when it reaches 80% of your connection circuit ingress bandwidth.

  • ExpressRoute Gateway Connection\BitsOutPerSecond

    • This metric is the number of bits leaving Azure per second for a specific connection to an ExpressRoute circuit (this data is billable).

    • Alert on this metric when it reaches 80% of your connection egress bandwidth.

  • ExpressRoute Virtual Network Gateway\PacketsPerSecond

    • This metric is the number of inbound packets traversing the ExpressRoute gateway.

    • Alert on this metric when it drops low enough to indicate it is no longer receiving traffic.

  • ExpressRoute Virtual Network Gateway\CPU Utilization

    • This metric is CPU utilization of the gateway instance.

    • High CPU utilization indicates a performance bottleneck.

    • Alert on this metric when it CPU utilization exceeds 85%.

Azure Advisor Alerts

Azure Advisors provide upwards of 280 alerts. This section provides the recommended alerts to configure in Azure Advisor for your Citrix environment. The alerts are categorized for your convenience into Reliability, Cost, Performance, and Operational Excellence. Each alert has a short description that includes why this alert is important to track in a Citrix environment. Several of the alerts can also be enforced via Azure Policy. These alerts only need to be configured one time and take about 30 minutes.

Reliability Alerts

  • Enable Backups on your Virtual Machines: Notifies you when your VMs are not enabled for automatic backup. Routinely back up all your Citrix infrastructure VMs.

  • Enable soft delete for your Recovery Services vaults: Notifies you when your Recovery Services vault data is set for hard or permanent delete instead of a soft delete. Use soft delete to avoid losing your Recovery Services Citrix infrastructure in the case of an accidental deletion.

  • Enable Soft Delete to protect your blob data: Notifies you when your Blob Storage data is set for hard or permanent delete instead of a soft delete. Use soft delete to avoid losing any blog storage data for Citrix applications or users in the case of an accidental deletion.

  • Enable Cross Region Restore for your Recovery Services Vault: Notifies you when your Recovery Services Vault is not enabled for cross-region restore, which means you cannot recover outside of your current region. Use to protect your Recovery Services Citrix infrastructure so it can be brought online in a different region if the primary region is inaccessible.

  • Move to production gateway SKUs from Basic gateways: Notifies you when your Gateways are using the Basic SKU which has lower performance than a Production SKU. Always use production gateway SKUs for Citrix infrastructure and users to provide the best performance and end user experience.

  • Enable Active-Active gateways for redundancy: Notifies you when your gateways are not set up for active-active fault tolerance. Always configure active-active gateways for a fault-tolerant Citrix infrastructure.

  • Implement multiple ExpressRoute circuits in your Virtual Network for cross-premises resiliency: Notifies you when your ExpressRoute circuits are not set up for high availability. Always configure ExpressRoute circuits for high-availability so your Citrix infrastructure is available to all users.

  • Use ExpressRoute GlobalReach to improve your design for disaster recovery: Notifies you when your ExpressRoute circuits are not using GlobalReach. Always configure ExpressRoute circuits for Global Reach to improve your disaster recovery design and make it more resilient.
  • Repair your log alert rule: Notifies you when a log alert rule is broken. If you are using Log Alert rules for monitoring your Citrix environment, you want to enable this alert so you know when the rule is broken and not performing correctly.

  • Log alert rule was disabled: Notifies you when a log alert rule was disabled. If you are using Log Alert rules for monitoring your Citrix environment, you want to enable this alert so you know when the rule is disabled and not running at all.

Cost Alerts

  • Right-size or shutdown underutilized virtual machines: Notifies you when the machine instance type for a VM is not being fully used so that you can select a smaller and less-expensive VM to meet your business needs. Use this alert to reduce the costs of your Citrix infrastructure.

  • Repurpose or delete idle virtual network gateways: Notifies you when you have virtual network gateways that are idle and can be removed to reduce costs. Use this alert to reduce costs and complexity of your network infrastructure.

  • Delete ExpressRoute circuits in the provider status of Not Provisioned: Notifies you when you have ExpressRoute circuits that are not fully provisioned. Use this alert to remove incomplete ExpressRoute circuits.

  • Use Standard Storage to store Managed Disks snapshots: Notifies you when you are using more expensive storage to store managed disk snapshots. Use this alert to save money when storing disk snapshots.

Performance Alerts

  • Improve user experience and connectivity by deploying VMs closer to user’s location: Notifies you when users are accessing Citrix resources that are far away from the user. Use for data center and site location to place users close to their Citrix resources.

  • Match production Virtual Machines with Production Disks for consistent performance: Notifies you when your production VMs are not using production disks. Always use production disks for production VMs for your Citrix VMs.

  • Consider increasing the size of your VPN Gateway SKU to address high CPU: Notifies you when your VPN Gateway SKUs are not optimal for your usage. Enable this alert if you have a high number of VPN users that may be affected by VPN gateway performance when accessing Citrix resources.

  • Consider increasing the size of your VNet Gateway SKU to address consistently high CPU use: Notifies you when your VNet Gateway SKUs are not optimal for your usage. Enable this alert if you have a high number of VNet Gateways that may be affected when routing traffic between VNets for Citrix resources.

  • Upgrade your ExpressRoute circuit bandwidth to accommodate your bandwidth needs: Notifies you when your ExpressRoute circuit bandwidth is not optimal for your current usage. Use this alert when you have one or more ExpressRoute circuits for your Citrix infrastructure.
  • Enable Accelerated Networking to improve network performance and latency: Notifies you when VMs would benefit from the use of Accelerated Networking. Use this alert to identify which Citrix VMs must have accelerated networking enabled.

Operational Excellence Alerts

  • Use Azure Policy to enable certain policies within the Azure environment. Here is a list of alerts that verify the Azure policy is in place:

    • Enforce ‘Add or replace a tag on resources’ in Azure Policy: used to verify that all Citrix resources are properly tagged.

    • Enforce ‘Allowed locations’ in Azure Policy: used to verify that access to the Citrix resources is restricted to particular locations to prevent malicious intent originating from untrusted locations.

    • Enforce ‘Allowed virtual machine SKUS’ in Azure Policy: used to prevent VMs from being created that fall outside the cost parameters for an environment. This policy is useful in preventing bitcoin mining with costly GPU instances.

    • Enforce ‘Inherit a tag from the resource group’ in Azure Policy: used to verify any resources in a resource group also inherit tags assigned to that resource group. This policy is useful for tracking auto-created Citrix resources.

  • Enable Traffic Analytics to view insights into traffic patterns across Azure resources: Notifies you when Traffic Analytics is not enabled for Azure resources. Used to secure the Citrix resources and prevent inadvertent or malicious access to data accessible through Citrix hosts.

  • Implement ExpressRoute Monitor on Network Performance Monitor for end-to-end monitoring: Notifies you when ExpressRoute circuit traffic is not being used to secure the Citrix resources. This policy helps identify and prevent accidental or malicious access to data over an ExpressRoute connection.

  • Add Azure Monitor to your virtual machine (VM) labeled as production: Notifies you when a production VM does not have Azure Monitor enabled. Used to identify any Citrix VMs not running Azure Monitor.

  • You have disks which have not been attached to a VM for more than 30 days: Notifies you when disks are not being actively used. Useful for reducing storage costs by removing unused disks.

Azure Service Health Alerts

This section provides the recommended service health alerts to configure. The list identifies the key services that are used by a Citrix deployment. Each alert has a short description that includes why this alert is important to track. These only need to be configured one time and take about 15 minutes or so to complete. We recommend subscribing to notification alerts for the following services used most often for Citrix environments running in Azure:

  • API Management: Used to manage Azure services from the Citrix Cloud.

  • Activity Logs & Alerts: Used to monitor the Citrix server logs and generate alerts.

  • Alerts & Metrics: Used to monitor the Citrix server metrics and generate alerts.

  • Azure Active Directory: Used for authentication to the Citrix servers, the Azure portal and to Citrix Workspaces.

  • Azure Monitor: Used to monitor the Citrix Resources hosted in Azure.

  • Azure Policy: Used to secure access to the Azure resources and enforce business rules across the Citrix environment.

  • Azure Private Link: Used to connect to Azure services from within the Citrix deployment.

  • Azure Sentinel: Used to monitor the security of the Citrix resources in Azure.

  • Backup: Used to back up your Citrix resources in the cloud.

  • ExpressRoute: Used to connect on-premises resources with Citrix deployment in Azure.

  • Key Vault: Used to manage the encryption keys that secure Citrix server volumes and the user data stored at rest.

  • Log Analytics: Used to monitor the logs for events that affect Citrix resources and need alerts.

  • Microsoft Azure Portal: Used to manage the Azure resources where the Citrix deployment is running.

  • Network Infrastructure: Used to monitor the communication between the Citrix resources, the on-premises data centers, and the remote users.

  • Network Watcher: Used to monitor the network traffic between Citrix and Azure resources.

  • Site Recovery: Used for providing high-availability and cross-site disaster recovery capabilities to your Citrix deployment.

  • Storage: Used to host the boot volumes for all Citrix resources in the cloud and to store user data.

  • VPN Gateway \ Virtual WAN: Used to connect users and on-premises resources with the Citrix deployment in Azure.

  • Virtual Machines: Used to host the Citrix Workloads in Azure.

  • Virtual Network: Used to communicate between the Citrix resources hosted in the Azure Cloud and remote users in addition to the on-premises data centers.

While configuring these service alerts, watch for other services included for your environment.

Citrix

This section covers the Citrix Tools and Services that can be used to monitor your Citrix Virtual Apps and Desktops service deployment in Azure.

Citrix Monitor

Citrix Monitor is the recommended tool from Citrix to monitor your Citrix Cloud deployment. The tool consists of the following components:

  • Dashboard: The main display that provides a real-time overview of the environment. The Dashboard includes key metrics, such as connection and machine failures, total sessions, average logon duration, and Citrix VDA hosts status. All of the reports and charts provide drill-down capabilities for identified issues.

  • Trends: Provides trend information for the following: Sessions, Failures, Logon Performance, Load Evaluation, Capacity Management, Machine usage, Resource Utilization, and Application Probes.

  • Alerts and Alert Policies: Interface to set up alerts for pre-defined Citrix alert policies.

  • Applications: Console to manage Application and Desktop probes and review the Application analytics.

Trends

Historical data is saved only for the last 90 days and is available to view through the Trends section of Citrix Monitor. The key trends to monitor for your Citrix deployment are as follows:

Connection Failures Connection failures can point to issues with particular Citrix VDA VMs or to particular users. The failed connection tab provides information on connections that fail because of the following common issues: client connection errors, licensing errors, unavailable capacity, machine failures or configuration errors. The single-session and multi-session failures show servers that failed to start, hung on boot or did not register.

Logon Performance Logon performance provides an overview of how long user logons are taking and it breaks them down into the following categories:

  • Brokering Time: This is the time that it takes Citrix to broker the session between the client and the Citrix VDA host. If this time is long, the issue lies with the Citrix infrastructure, start by verifying the Cloud Connectors and any StoreFront servers have sufficient capacity.

  • VM Start time: This is the time that elapses between when the user clicks the icon to access their desktop and the time it takes to start the Virtual Machine for them. If this metric seems too long, consider increasing the buffer capacity for the delivery group.

  • HDX Connection Time: The time it takes to set up the HDX connection between the client and the Citrix VDA host. If this is metric seems slow, look at the network connections. Verify that packets are not getting dropped excessively and the network bandwidth utilization is under 80%.

  • Authentication Time: The time it takes to complete the authentication for the remote session. If this time is long, research which AD Domain Controllers (DCs) are being used for authentication. Verify that your sites and services are configured so the closest DCs are being used to authenticate and they have the compute capacity to handle the session load.

  • GPO Time: The time it takes to apply the group policy settings (including Citrix policies) to the session. If the metric is too long, you can drill down by clicking the “Detailed Drilldown” link to view each GPO's time. Look at the number of GPOs being applied and either consolidate the GPOs or find a third-party solution that applies GPOs synchronously instead of asynchronously.

  • Logon Scripts Time: The time it takes to run any logon scripts before the Windows Explorer starts. If this metric is too long, investigate the Logon scripts that are being applied through GPO. Look for ways to optimize the logon scripts.

  • Profile Load Time: The time it takes to load the Windows user profile before the interactive session starts. Important to remember that if you are using Citrix Profile Management, the load time is included in this metric. If you are using another Profile Management solution that relies on Windows profiles, the actual profile load time is included in the Interactive Session metric. To reduce load times, you can use Citrix Profile Management with the “Large file handing” feature enabled or move to streamed profiles.

  • Interactive Session Time: The time it takes to grant the user keyboard and mouse control after the Windows profile loads. This metric includes three phases: pre-userinit, userinit, and shell. This time includes third-party profile solutions that run after the Windows profile loads and before the user is granted control of the desktop.

Resource Utilization This chart provides a view of the key metrics and a comparison of the previous 24 hours to the current metrics. This chart is useful for determining at a glance where the performance bottleneck might be when you are seeing long logon times or failed connections. If you identify trends with machines, you can use Azure Monitor to investigate further.

Citrix Policy controls Resource Monitoring and enables it by default. Citrix Policy for Process Monitoring is disabled by default because it consumes extra resources, but it provides detailed information for processes.

Alerts

Similar to Azure Alerts, Citrix alerts can be configured to email you alerts for metrics that are important to resolve quickly. Set alert policies for failures to reduce the amount of effort involved with reviewing the site metrics frequently. This frees you up to work on higher priority tasks. With the Premium license, you can set values at Warning and Critical levels to receive emails. When monitoring your Citrix deployment in Azure, the following alerts are recommended:

Site Policies

The Site Policies aggregate alerts across all delivery groups, users, and machines and provide warnings for site-wide events. These alerts are useful to let you know when you have any site resources falling outside the benchmark areas.

  • Connection failure rate: The percentage of connection failures over the past hour. Set an alert after carefully reviewing your baseline values for this counter. All environments have a base failure rate that occurs naturally as users attempt to connect, though 0% is the ideal value.

  • Connection failure count: The number of failed connections over the past hour. Set an alert after carefully reviewing your baseline values for this counter. All environments have a base failure count that occurs naturally as users attempt to connect, though 0 is the ideal value.

  • Failed machines (Single-session OS): The number of failed Single-session OS machines. Set an alert when this counter has a value greater than 1.

  • Failed machines (Multi-session OS): The number of failed Multi-session OS machines. Set an alert when this counter has a value greater than 1.

  • Average logon duration: The average time for a user to log on over the past hour. Citrix recommends a warning when the average logon duration time exceeds 45 seconds. A better metric might be when the average logon duration exceeds 125% of your baseline logon time.

Delivery Group and Multi-session Policies

These metrics are aggregated at the Delivery Group, Multi-session, or Single-session machine level. These metrics are useful to watch when you need to focus on a particular set of resources to verify they are performing as expected. For example, when you want to monitor the user experience for the virtual desktops dedicated to executives. In those cases, you might have tighter alerting on any failure rates or average logons events.

  • Connection failure rate: The percentage of connection failures over the past hour. Set an alert after carefully reviewing your baseline values for this counter. All environments have a base failure rate that occurs naturally as users attempt to connect, though 0% is the ideal value.
  • Connection failure count: The number of failed connections over the past hour. Set an alert after carefully reviewing your baseline values for this counter. All environments have a base failure count that occurs naturally as users attempt to connect, though 0 is the ideal value.
  • ICA RTT (Average): Average ICA Round-Trip Time. Citrix recommends that a warning alert be set when 5 or more sessions experience an ICA RTT of 300 ms or longer.

  • Average logon duration: The average time for a user to log on over the past hour. Citrix recommends a warning when the average logon duration time exceeds 45 seconds. A better metric might be when the average logon duration exceeds 125% of your baseline logon time.

User Policies

The user alerts are the only alerts not aggregated across multiple resources. Since they are not aggregated, you can alert on the actual values when they fall outside the acceptable ranges.

  • ICA RTT: ICA/HDX Round-Trip Time (RTT) in milliseconds (ms). Any RTT latency under 50 ms is considered ideal. Typically, the user experience starts to degrade when the RTT latency exceeds 100 ms for an extended period. The alert is triggered when ICA RTT is greater than the threshold set.

Applications

The Applications section of Citrix Monitor provides insights into the health and usage information of published desktops and applications. If the Citrix Probe Agent is installed on a machine and configured through the console, the probe results for the last 24 hours are shown. Citrix Monitor shows the probe results along with any other application analytics for faults and errors giving you a summary view of the environment’s health. The probes show the stage in the launch process where the application failed, such as authentication, enumeration, or ICA file download. This information is invaluable when troubleshooting application launch issues. Using the application monitoring allows you to pro-actively address issues before they become outages.

Citrix Analytics

Citrix Analytics is a cloud-based service that aggregates data gleaned from Citrix users across devices, networks, and applications. The sole purpose of Citrix Analytics is to identify relationships and trends that can lead to actionable insights. Analytics relies on built-in Machine Learning (ML) algorithms to find behavioral anomalies that can indicate issues with Citrix users. Citrix Analytics works with third-party providers, including Microsoft, to gather data for analysis and has these offerings:

Citrix Analytics for Security: Focuses on user and application behavior, looking primarily for insider threats or malicious behavior.

Citrix Analytics for Performance: Focuses on the user experience. The performance analytics uses data from virtual applications and desktops to generate a User Experience score from key factors that define the user experience.

Citrix Analytics integrates with the following products to provide comprehensive views:

  • Citrix Virtual Apps and Desktops

  • Citrix Application Delivery Controller (NetScaler)

  • Citrix Secure Workspace Access (Access Control)

  • Citrix Gateway

  • Citrix Content Collaboration

  • Citrix Endpoint Management

  • Citrix Secure Browser

  • Microsoft Graph Security

  • Microsoft Active Directory

Any data collected is retained for 13 months or 396 days or until 90 days after subscription termination.

Data can be integrated into any SIEM service that supports Kafka topics or Logstash-based data connectors, such as Microsoft Sentinel. Data can also be exported in a comma-separated value (CSV) format for analysis on other systems.

Citrix Analytics is accessed through your Citrix Cloud account. Once set up and configured, you gain access to dashboards that provide information and recommendations compiled by Citrix Analytics.

Dashboard Information Provided Citrix Analytics Service
Users User-behavior patterns Security
User Access Summary of risky domains and the volume of ingress/egress data Security
App Access Summary of the domains, URL, and apps accessed by users Security
Share Links Summary of the organizational share link patterns Security
Access Assurance Location Summary of the logon and access details for Citrix Virtual Apps and Desktops users Security
Reports Custom report creation with available metrics Security
User Experience Summary of the key site performance metrics Performance
Infrastructure Summary of the status and health of your site virtual machines Performance

Citrix Analytics-Security provides these reports, risk assessment scores and indicators for the users, share links, and IP address locations. Custom risk indicators can be created in addition to custom policies to refine the conditions used for the risk assessment. You can enable a feature called Request End User Response, which immediately alerts the user when unusual activity is observed. Watchlists is another feature that allows you to monitor specific users who represent a potential threat or higher risk. You receive weekly emails from Citrix Analytics-Security with important risk indicators and users identified.

Sources

The goal of this reference architecture is to assist you with planning your own implementation. To make this job easier, we would like to provide you with source diagrams that you can adapt in your own detailed designs and implementation guides: source diagrams.

References

Operations

Identity

Governance

Security

Azure Monitor

Connectivity


User Feedback


There are no comments to display.



Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...