Monitoring DaaS solution in Azure
Monitoring your Citrix deployment provides you with several benefits: increased performance, higher availability, lower cost, enhanced security and satisfied users. Both Microsoft and Citrix provide a set of core tools and services to assist with monitoring the environment. This paper contains an overview of the available tools along with recommendations for areas to monitor targeted for Citrix deployments in Azure.
The Microsoft tools and services include the following: Azure Monitor, Azure Advisor, Azure Service Health, Microsoft Sentinel, Azure Network Watcher, and Azure Spend. The Citrix tools and services include the following: Citrix Monitor, Citrix Analytics, and Citrix Managed Services. Some of these services do incur extra charges, but most of them are included with your subscription.
This document provides a list of the recommended tools and a section where we identify baseline values of key elements to monitor. We also recommend configurations for the values to help you successfully deploy Citrix in Azure.
Microsoft
This section covers the Microsoft Azure tools and services that can be used to monitor your Citrix Virtual Apps and Desktops (CVAD) deployment in Azure.
Azure Monitor
For a Citrix deployment in Azure, Azure Monitor is the best place to start. Azure Monitor helps you improve both the performance and availability of your Citrix deployment. Azure Monitor collects and analyzes the telemetry received from both your on-premises and Azure environments. Using Azure Monitor allows for proactive responses to issues with resources before users need to open a ticket with your help desk. Azure Monitor consists of six different services that can be used with one another to manage your Citrix resources:
-
Metrics: A collection of numerical values that represent a particular aspect of an Azure resource at a point in time.
-
Alerts: A collection of conditions that should be monitored and act as triggers to initiate an associated action when the condition occurs.
-
Logs: A collection of data written to logs and available for analysis through Azure Metrics.
-
Dashboards: A customizable view of information available on monitored resources.
-
Application Insights: A service that monitors your web applications and supports performance optimizations and troubleshooting.
Metrics
Azure Metrics is the single most powerful tool available in Azure Monitor for tracking the health of your Citrix resources. The term “metrics” represents information about particular aspect of a resource that is distilled to a numerical value. Metrics are tracked over time and reported on at a specific intervals. For instance, the number of active sessions on a Citrix VDA host is collected every 30 seconds and displayed in a real-time chart.
Azure Metrics allows for the tracking and alerting of metrics for each of your Citrix resources. Azure Metrics provides metrics for the Citrix virtual machines (VMs) and the underlying virtual machine host. Azure Metrics also has the ability to add diagnostic extensions to gather metrics from the guest operating system. Metrics are provided in near real-time and can be viewed through the Metrics Explorer charts. Metrics Explorer charts can compare metrics from different resources and saved to Dashboards for monitoring the environment.
To monitor Citrix virtual machine resources in Azure, be sure to enable the Guest OS Metrics through the Diagnostic Settings for the virtual machine. This setting automatically does the following:
-
Enables performance counters for CPU, Memory, Disk, and Network at one-minute intervals.
-
Enables event log entry collection (Warning level and above).
-
Provides the option to collect Custom performance counters and event logs.
Guest OS metrics are retained for 93 days when sent to Azure Monitor Metrics.
The following additional settings are recommended for Citrix deployments in Azure:
-
Enable the Sinks > Azure Monitor > Send diagnostic data to Azure Monitor setting. This setting allows the use of Custom counters to collect multi-dimensional metrics and enables alerting on the Guest OS metrics.
-
Enable Crash dump settings when troubleshooting an issue with Citrix or Microsoft Support. This setting places the dump files directly in a storage container where you can easily retrieve them.
Collecting metrics is a powerful way to track the health and performance of your Citrix resources. Azure Metrics can track and alert on any metric that is available as a Windows performance monitor counter. Metrics are the basis for orchestration which uses rules to automate actions within Azure.
Alerts
The primary purpose for monitoring your Citrix infrastructure in Azure is to be able to proactively respond to issues before the users are adversely affected. Alerts notify you or take automated action on a condition that needs to be handled quickly. Although not all disruptions provide warning signs, the diligent use of alerts can prevent most common scenarios.
Conditions for an alert can be based on a set of predefined signals that Azure provides or upon Guest OS metrics. These signals include metric values (the most common), log search results, Azure Activity log events, or even health of the Azure platform. You need to set the alerts at a level that provides advanced notice of a potential issue while minimizing the frequency alerts that require action. An alert rule is condition that must be met for the alert to fire off when enabled. The alert rule can then execute a set of actions defined in an Action Group. The available actions include the following:
- Notifications by email, SMS, Push, or Voice
- Triggering of an Automation Runbook, Azure Function, Logic App, Event Hub, or Webhook
- Creation of an ITSM Ticket
Alerts can be scoped to a particular resource group, region, or resource type. When configuring alerts for multiple targets, only a single condition can be specified and the targets must all support that condition. For metrics-based conditions, the alert rule definition includes the severity level along with the ability to resolve the alert automatically. Once fired, alerts need to be acknowledged when automated responses are not employed to handle the alert condition. Alerts do entail a monthly cost and Azure displays the estimated cost for acceptance when the alert rule is created.
Logs
Sometimes, metrics are not available for a particular event that you want to monitor for within your Citrix deployment. When metrics are not available, logs can be monitored for entries that indicate the event has occurred. Azure Monitor Logs can accept logs from Azure Services, virtual machine agents, or from applications using Application Insights. A Log Analytics workspace is required where the log data can be stored for analysis. These logs can then be aggregated and queried for key entries that indicate conditions which need to be managed. The query results can be viewed through either a dashboard or a workbook.
Azure Monitor Metrics is limited to numerical data only. Azure Monitor Logs can store and analyze different data types, which provides an advantage in some situations. The log analysis requires the use of a queries which must be created and maintained. The queries are written in the Kusto Query Language (KQL), which is the same language used by Azure Data Explorer.
Dashboards
Dashboards represent a visual way to monitor your Citrix environment daily. Dashboards consist of tiles that come from any number of gallery selections. The possible tiles include metrics charts, security charts, user information, automation, or a direct link to any resource or resource group. Custom dashboards can be created that focus on a particular role or set of resources. Each dashboard can be shared or private and each portal user can have up to 100 private dashboards and an unlimited number of shared dashboards.
Application Insights
If you have web applications that are hosted in Azure and delivered via Citrix, use Application Insights to monitor your applications that are coded on popular web platforms. Application Insights can integrate with your DevOps process using a software development kit (SDK) or the Application Insights Agent. Application Insights then combines the telemetry provided with performance counters and other diagnostic information. These insights can help with diagnosing issues and provide a deeper understanding of how users interact with your application.
Application Insights delivers the information collected to Azure Monitor. You can use Microsoft PowerBi or similar tools to analyze the raw data stored in Azure Monitor. Some of the areas that can be monitored with Insights include the following:
-
What pages are most popular and what time of day they load.
-
What pages are failing to load to help you diagnose resource issues.
-
Load performance for your web application from the perspective of the user’s browser.
-
Any exceptions that occur, whether caused by the server or browser code.
-
Any custom events or metrics that you choose to instrument with the Insights SDK.
The Application Insights console lets you manage the performance of your web applications on Citrix to provide a better end-user experience.
Azure Advisor
Azure Advisor is a service that analyzes your resource configurations in the background and makes recommendations to help improve your Azure Deployment. These recommendations are grouped into five categories: Cost, Security, Reliability, Operational Excellence, and Performance. The Security category comes from the Microsoft Defender for Cloud. For each category, the Advisor lists the resources affected and provides guidance on how to improve the resource configuration. You can filter the recommendations by resource type and subscription.
Azure Advisor supports the configuration of Alerts to monitor for situations where your Azure environment falls outside the best practices recommendations. See the Azure Advisor Alerts section later in this document for recommendations.
Microsoft Defender for Cloud
Defender for Cloud is a service that combines functionality previously found in Azure Security Center and Azure Defender. This service continuously assesses your Azure resources and provides and overall score that indicates the security posture of your deployments. Azure Advisor’s Security recommendations are directly from Defender for Cloud. Defender for Cloud also provides direct guidance on how to resolve any issues the service identifies. The recommendations come from the Azure Security Benchmark, an Azure-specific set of guidelines authored by Microsoft.
Defender for Cloud with enhanced security features can be deployed in a hybrid configuration to support on-premises deployments along with other cloud providers.
For Citrix deployments, enabling Defender for Cloud provides the following features that secure your Citrix resources:
-
Risk assessment for resources being accessed from the internet, such as source IP address and frequency.
-
Just-in-time (JIT) VM access that limits when ports are open for initial inbound connections. Microsoft recommends JIT for all jump box or bastion host connections.
-
Adaptive network hardening (ANH) which further hardens the Network Security Group (NSG) rules. ANH uses machine learning algorithms, trusted configurations, threat intelligence and other factors to provide recommendations.
-
Fileless attack detection which periodically scans a running machine’s memory to look for malicious payloads running in memory to avoid disk-based detection software.
-
Integration with Microsoft Sentinel.
Microsoft Sentinel
Microsoft Sentinel is a both a Security Information and Event Management (SIEM) and Security Orchestration, Automation, and Response (SOAR) system. Sentinel was designed and built as a cloud-native service. Using sophisticated artificial intelligence, Sentinel continuously monitors all content sources and hunts for suspicious activity. Sentinel provides a central location for collecting and monitoring data at scale through agents and data connectors. Security incidents are tracked through triggered alerts and automated responses to common tasks. Sentinel can operate across multiple clouds and with your on-premises infrastructure, making it ideal for hybrid Citrix environments.
The Content hub provides a simple interface to enable out-of-the-box pre-packaged solutions for Sentinel. These packages contain Analytics Rules, Hunting Queries, Playbooks, Data Connectors, and Workbooks that are specific to their topics. The following Content hubs are recommended for your Citrix deployment in Azure:
-
Azure Firewall to help increase security of the networking communication.
-
Cybersecurity Maturity Model Certification CMMC to meet CyberSecurity compliance guidelines within your environment.
-
Microsoft Sentinel Deception to protect against all threats.
-
Microsoft Insider Risk Management to help protect against insider threats.
-
Threat Analysis Response to manage and correlate threat activity.
Data Connectors provide a way to interface Sentinel with other Azure services and third-party systems. The connectors provide the data that is analyzed by Sentinel for potential threats. The following Data Connectors are recommended for your Citrix deployment in Azure:
-
Azure Active Directory for information about user identities, signins, provisioning, etc.
-
Azure Active Directory Identity Protection for security alerts with identities.
-
Azure Activity for any Azure resource activity.
-
Azure DDoS Protection for information on Distributed Denial of Service attacks through flow logs and DDoS notifications.
-
Azure Firewall for information on firewall activity, network rules and DNS proxies.
-
Azure Key Vault for information on Azure key vault activity.
-
Azure Storage Account for information on Azure storage account activity for blogs, queues, tables, files, and resource access.
-
Citrix Analytics for information gathered by Citrix Analytics (see the Citrix Analytics section).
-
Citrix Web App Firewall for Citrix firewall activity.
-
Microsoft Defender for Cloud for security alerts originating from Defender.
-
Microsoft Office 365 for any Office activity, assuming your Office 365 tenant is the same tenant as used for your Citrix deployment.
-
Threat Intelligence – TAXII for identifying and remediating potential threats.
-
Windows Firewall for events generated by Windows Firewall service running on Citrix servers.
-
Windows Security Events via Azure Monitor Agent (AMA) for events from the Windows Security event log on Citrix servers.
Microsoft Sentinel supports data connectors from a wide variety of vendors. These vendors included security, networking, and application vendors. Consider reviewing the available data connectors at least annually to keep Sentinel effective as possible.
Azure Service Health
Azure Service Health provides an easy way to monitor the Azure infrastructure that is hosting your Citrix deployment. Service Health lets you monitor service issues, view upcoming planned maintenance, and track Health and Security advisories. You can filter the active issues and planned maintenance by subscription, region, and service. Any issues with widespread impact are displayed under the Service Issues blade.
With health alerts, you can monitor the health of your own Azure resources. Use health alerts to configure automated notification of service outages or planned maintenance that affect your resources. See the Azure Service Health alerts section later in this document for recommendations.
If you have other services that you use frequently, we recommend subscribing to those services as well. If you set up your alerts correctly, you receive notification of any outages when they happen and planned maintenance does not catch you off guard.
Azure Network Watcher Traffic Analytics
While Citrix is built to be secure by design, users are still a weak link and login credentials can be compromised. When running Citrix in Azure, one of the best ways to secure access to your applications and data is by monitoring the network traffic. Traffic Analytics is designed to provide you relevant information by analyzing the network traffic flows. By combining raw flow logs with a knowledge of the network topology, Traffic Analytics can provide a comprehensive view of the network communication. The reports include the most active hosts or host pairs, top protocols in use, blocked traffic, open ports, rogue networks, and traffic distribution.
To use Traffic Analytics, your Citrix resources need to be in a region that supports both Network Security Groups (NSGs) and Log Analytics Workspaces. You also need to enable Network Watcher in the same region. For each network security group that includes Citrix resources, create an NSG flow log and enable both Flow Logs Version 2 and Traffic Analytics when creating it. For regulatory compliance, be sure that your Log Analytics Workspace is in the same country as where the NSG flow logs are generated.
NOTE: At a minimum, create NSG flow logs for your Citrix Cloud Connectors, Delivery Controllers, ADC appliances, and StoreFront servers.
Use Traffic Analytics to identify malicious traffic, host spots and busy hosts. Always remember that clients are going to a specific set of hosts, so sometimes normal traffic may appear in the “Frequent conversation” list. The geo-map can be used to visualize the communication sources and quickly identify unexpected and possibly malicious traffic sources. Reviewing the traffic flow patterns, open ports, and blocked traffic can provide you insights into potential threats or unprotected attack vectors.
Azure Cost Management
Azure Cost Management and Billing allows you to configure alerts to warn you when your cost limits have been reached. Spend alerts are the best way to manage your Citrix resources. For large enterprises, enabling budget, credit and quota alerts help you identify any potential misconfiguration or misuse of Azure resources.
-
Budget Alerts: An alert is sent when either the usage or dollar amount reaches a predefined limit based on a previously established budget.
-
Credit Alerts: The system generates credit alerts automatically when 90% and 100% of your prepayment (monetary commitment) is achieved.
-
Department Spending Quota Alerts: Quota alerts are configured only through the Enterprise Agreement (EA) portal. When triggered, the portal sends an email to department owners when their spend reaches a defined percentage.
Creating a monthly budget with spend alerts provides you advance notice when resources are unexpectedly provisioned. Common reasons for unexpected spend include automation errors, autoscaling misconfiguration, or even malicious intent by trusted insiders. The sooner you are alerted to the additional cost the sooner you can resolve the issue.
Baseline Metrics and Alerts for Azure
The key to a good monitoring environment is knowing what is important to monitor and which items require immediate attention. You don’t want to monitor every available metric because you end up storing information that is not useful. Information collection and storage has a cost associated with it, so use it wisely. Here we provide a baseline of Metrics/Counters to monitor and suggest alerts that can give you a starting point to monitor your Citrix environment. You can build on this baseline and include other performance counters or events that you feel are helpful for your environment.
Metrics and Alert Thresholds
For a Citrix deployment, we are going to focus primarily on the Guest OS metrics of Citrix virtual machines. Poor server performance metrics typically indicate that the users are about to experience unpleasant issues, if they are not already. For instance, when the Max Input Delay for a user’s session reaches a predefined delay we know users are experiencing latency. You can configure the Action group to send an email to the Citrix administrators alerting them to the server’s issues. By setting the notification alert to fire off when the Max Input Delay approaches a value known to be unacceptable, admins can intervene proactively.
We have provided the performance counters to monitor along with suggested thresholds for alerting on those counters when used in a Citrix deployment. The suggested alert thresholds are likely to provide advanced notice of user dissatisfaction. Adjust the values and time periods to meet your business needs:
All Citrix Servers
Here is the list of perfmon counters to monitor for all Citrix servers in the deployment:
-
Processor\%Processor time
-
This counter is the amount of time a processor is not idle.
-
Alert when the average is greater than 80% for a sustained 15 minutes.
-
Determine the processes that are consuming the most CPU and identify the cause of the high CPU usage using Task Manager or Citrix Monitor.
-
If all processes are consuming an expected level of CPU time, then it is time to increase capacity for the server or the Delivery Group.
-
-
System\Processor queue length
-
This counter is the number of threads in a processor queue waiting to be processed.
-
Alert when greater than 5* [number of cores] over a 5-minute interval.
-
Determine which processes are consuming the most CPU and identify the cause of the CPU usage using Task Manager or Citrix Monitor.
-
If all processes are consuming an expected level of CPU time, then it is time to increase capacity for the server or the Delivery Group.
-
-
Memory\Available Bytes
-
This counter is the amount of memory not allocated to processes or cache.
-
Alert when the available amount of RAM is under 20% of the total RAM over a 5-minute interval.
-
Determine which processes are consuming the memory using Task Manager or Citrix Monitor. Identify any configuration changes that could reduce that level of RAM consumption. Use this metric with the Memory Pages/sec and Paging File %usage counters.
-
If all processes are consuming the expected amount of memory, then it is time to increase capacity for the server or the Delivery Group.
-
-
Memory\Pages/sec
-
This counter is the number of pages per second that are swapped from disk to running memory.
-
Alert when the pages per second are consistently over 10.
-
Look for applications that are causing the page swaps using Task Manager. Investigate possible alternative configurations. Use this metric with the Memory Available Bytes and Paging Files\%usage counters.
-
If possible, increase the amount of RAM available to the host. If that is not an option, attempt to isolate the application to a set of dedicated servers.
-
-
Paging File\%usage
-
This counter is the percentage of the current page file that is in use.
-
Alert when the page file usage is greater than 80% for 60 minutes.
-
Look for applications that are causing the page file usage using Task Manager. Investigate possible alternative configurations. Use this metric with the Memory Available Bytes and Memory Pages/sec counters.
-
If possible, increase the amount of RAM available to the host.
-
-
LogicalDisk\%Disk Time (_total)
-
This counter represents the amount of time the Logical disk is not idle.
-
Alert when the % disk time is greater than 90% for 15 minutes.
-
Look for applications that are causing the high disk usage using Task Manager or Citrix Monitor. Investigate what might be causing the high disk utilization. Use this metric with other logical and physical disk metrics.
-
If all activity looks normal, look for a way to move the applications to disks with higher performing disk subsystems.
-
-
LogicalDisk\Current disk queue length
-
This counter represents the number transactions waiting for the logical disk to process them.
-
Alert when the current disk queue is greater than 3 for 15 minutes.
-
Look for applications that are causing the high disk usage using Task Manager or Citrix Monitor. Investigate what might be causing the high disk utilization. Use this metric with other logical and physical disk metrics.
-
If all activity looks normal, look for a way to move the applications to disks with higher performing disk subsystems.
-
-
PhysicalDisk\%Disk Time (_total)
-
This counter represents the amount of time the Physical disk is not idle.
-
Alert when the % disk time is greater than 90% for 15 minutes.
-
Look for applications that are causing the high disk usage using Task Manager or Citrix Monitor. Investigate what might be causing the high disk utilization. Use this metric with other logical and physical disk metrics.
-
If all activity looks normal, look for a way to move the applications to disks with higher performing disk subsystems.
-
-
PhysicalDisk\Current disk queue length
-
This counter represents the number transactions waiting for the physical disk to process them.
-
Alert when the current disk queue is greater than 3 for 15 minutes.
-
Look for applications that are causing the high disk usage using Task Manager or Citrix Monitor. Investigate what might be causing the high disk utilization. Use this metric with other logical and physical disk metrics.
-
If all activity looks normal, look for a way to move the applications to disks with higher performing disk subsystems.
-
-
Network Interface\Bytes Total/sec
-
This counter shows the rate at which the network adaptor is processing data packets for the network.
-
Alert when Bytes Total per second are greater than 80% of the NIC’s speed for 5 minutes.
-
Look for applications that are causing the high network usage using Task Manager to investigate what might be causing the high disk utilization. Use this metric with other logical and physical disk metrics.
-
If all activity looks normal, look for a way to increase the network bandwidth or increase capacity to the Delivery Group.
-
-
User Input Delay per Session\Max Input Delay
-
This metric provides the maximum input delay for the session in milliseconds. The metric measures the time between when the user provides mouse or keyboard input and their input is processed by the system.
-
Alert when a session’s input delay is greater than 1000ms for 2 minutes.
-
Look for applications that are causing high CPU, disk, or network utilization using the Task Manager or Citrix Monitor.
-
If activity looks normal, the best approach is to increase capacity to the Delivery Group.
-
Cloud Connectors
In addition to the counters for all Citrix Servers, enable the following custom performance counters for your Citrix Cloud Connectors. These counters monitor for key failures in the Cloud Connectors:
-
Citrix High Availability Service\Database Transaction Errors/sec
-
This metric represents the number of database transaction failures per second.
-
This number should be 0.
-
Alert when the counter is greater than 0.
-
-
Citrix High Availability Service\Failed Leased Enumerations
-
This metric represents the number of failed enumerations for clients.
-
This number should be 0.
-
Alert when the counter is greater than 0.
-
-
Citrix High Availability Service\Failed Leased Launches
-
This metric represents the number of failed launches for clients.
-
This number should be 0.
-
Alert when the counter is greater than 0.
-
-
Citrix High Availability Service\Registration Rejects/sec
-
This metric represents the number of registrations rejected per second.
-
This number should be 0.
-
Alert when the counter is greater than 0.
-
Citrix Virtual Delivery Agent Virtual Machines
In addition to the counters for all Citrix Servers, enable the following custom performance counters for your Citrix Virtual Delivery Agent hosts. These counters monitor for key failures:
-
ICA Session\Latency - Session Average
-
This metric provides the average ICA latency for a user session in milliseconds.
-
Use this metric to monitor the user experience, the value should be under 150ms for a good user experience and anything over 300ms is considered degraded.
-
If you are seeing high latency values, look into enabling Adaptive Transport to help mitigate the effects of the latency.
-
-
User Input Delay per Session\Max Input Delay
-
This metric provides the maximum input delay for the session (in milliseconds). The metric measures the time between when the user provides mouse or keyboard input and their input is processed by the system.
-
Use this metric to monitor the user experience, the value should be under 500ms, with under 150ms being considered good and anything over 1000ms considered unacceptable.
-
-
Terminal Services\Active Sessions
-
This metric provides the number of active sessions on the Citrix VDA host.
-
Monitor this metric for multi-session hosts.
-
Use this metric to correlate with other metrics by showing active user counts on the graph.
-
-
CitrixPrinting\Total Jobs Failed
-
This metric represents the total number of print jobs that failed on the Citrix VDA host and should be low.
-
Monitor this metric to see the number of print jobs that are failing on the Citrix hosts.
-
Excessive failed print jobs could point to issues with the Printer Drivers installed on the Citrix host.
-
Enable the following custom performance counters for your Citrix Virtual Delivery Agent VMs that are running Citrix Profile Management:
-
CitrixProfileManagement\Logon Duration
-
This metric represents the total time in seconds for the user logon event to complete.
-
Monitor this metric to understand the user logon experience. This metric includes the time takes to load the user profile down to the user’s session.
-
-
CitrixProfileManagement\Logoff Duration
-
This metric represents the total time in seconds for the user logoff event to complete.
-
Monitor this counter to track how long the user logoff event is taking. This metric includes the time it takes for the users data to be written back to the profile location.
-
-
CitrixProfileManagement\Processed Logoff Files-Above 5MB
-
This metric represents the number of files greater than 5MB that are uploaded to the user profile store during logoff.
-
Monitor this metric to determine if enabling Large File Handling or folder redirection can improve the user logon experience.
-
-
CitrixProfileManagement\Processed Logon Files-Above 5MB
-
This metric represents the number of files greater than 5MB that are copied down from the user profile storage during logon.
-
Monitor this metric to determine if you need to enable profile streaming or Large File Handling to reduce logon times.
-
Enable Application Log collection on your Citrix Virtual Delivery Agent VMs. Set the following configurations as a baseline:
-
Alert on any RDP Licensing Errors.
-
Alert on these Security Warnings.
-
Event ID 4625: An account failed to log on.
-
Event ID 4771: Kerberos pre-authentication failed.
-
-
Alert on these Citrix Warning or Error messages.
-
Event ID 1001: The Citrix Desktop Service failed to obtain a list of delivery controllers with which to register.
-
Event ID 1017: The Citrix Desktop Service failed to register with any delivery controller.
-
Event ID 1022: The Citrix Desktop Service failed to register with any controllers in the last 5 minutes.
-
Event ID 6013: System uptime, use to find Citrix servers that are not getting rebooted after patching.
-
Citrix StoreFront Servers
In addition to the counters for all Citrix Servers, enable the following custom performance counters for your Citrix StoreFront servers. The counters monitor for poor performance:
-
ASP.NET\Request Queued
-
The number of requests ASP has in the queue waiting to be processed.
-
Alert when the values are significantly outside the baseline norms. Establish baselines based on the environment.
-
-
ASP.NET\Requests Rejected
-
The number of requests rejected because the request queue is full.
-
Alert when the number of rejected requests is greater than one.
-
Citrix Federated Authentication Service (FAS) Servers
In addition to the counters for all Citrix Servers, enable the following custom performance counters for your Citrix Federated Authentication Service hosts. These metrics monitor for performance related issues:
-
Citrix Federated Authentication Service\High Load Level
-
This metric tracks the number of certificate signing requests per minute that Federated Authentication Service accepts.
-
Track this metric because once the High Load level is met, desktops and applications fail to launch.
-
Azure ExpressRoute Metrics
If you have an ExpressRoute connection to an on-premises data center or to a peered network, you should monitor that connection. You need to understand your bandwidth needs and to know how much billable egress traffic is leaving Azure. The key metrics to watch are as follows:
- ExpressRoute circuit\BitsInPerSecond
-
This metric is the number of bits coming into Azure per second. This data is free.
-
Use this metric for ExpressRoute capacity planning.
-
Alert on this metric when it reaches 80% of your available circuit ingress bandwidth.
-
-
ExpressRoute circuit\BitsOutPerSecond
-
This metric is the number of bits leaving Azure per second. This data is billable.
-
Use this metric for ExpressRoute capacity planning and for budgeting for data egress.
-
Alert on this metric when it reaches 80% of your available circuit egress bandwidth.
-
-
ExpressRoute circuit\GlobalReachBitsInPerSecond
-
This metric is the number of bits coming into to Azure per second to peered ExpressRoute circuits (this data is free).
-
Use this metric for ExpressRoute capacity planning.
-
Alert on this metric when it reaches 80% of your available circuit ingress bandwidth.
-
-
ExpressRoute circuit\GlobalReachBitsOutPerSecond
-
This metric is the number of bits leaving Azure per second to peered ExpressRoute circuits (this data is billable).
-
Use this metric for ExpressRoute capacity planning and for budgeting for data egress.
-
Alert on this metric when it reaches 80% of your available circuit egress bandwidth.
-
-
ExpressRoute Gateway Connection\BitsInPerSecond
-
This metric is the number of bits coming into to Azure per second for a specific connection to an ExpressRoute circuit (this data is free).
-
Alert on this metric when it reaches 80% of your connection circuit ingress bandwidth.
-
-
ExpressRoute Gateway Connection\BitsOutPerSecond
-
This metric is the number of bits leaving Azure per second for a specific connection to an ExpressRoute circuit (this data is billable).
-
Alert on this metric when it reaches 80% of your connection egress bandwidth.
-
-
ExpressRoute Virtual Network Gateway\PacketsPerSecond
-
This metric is the number of inbound packets traversing the ExpressRoute gateway.
-
Alert on this metric when it drops low enough to indicate it is no longer receiving traffic.
-
-
ExpressRoute Virtual Network Gateway\CPU Utilization
-
This metric is CPU utilization of the gateway instance.
-
High CPU utilization indicates a performance bottleneck.
-
Alert on this metric when it CPU utilization exceeds 85%.
-
Azure Advisor Alerts
Azure Advisors provide upwards of 280 alerts. This section provides the recommended alerts to configure in Azure Advisor for your Citrix environment. The alerts are categorized for your convenience into Reliability, Cost, Performance, and Operational Excellence. Each alert has a short description that includes why this alert is important to track in a Citrix environment. Several of the alerts can also be enforced via Azure Policy. These alerts only need to be configured one time and take about 30 minutes.
Reliability Alerts
-
Enable Backups on your Virtual Machines: Notifies you when your VMs are not enabled for automatic backup. Routinely back up all your Citrix infrastructure VMs.
-
Enable soft delete for your Recovery Services vaults: Notifies you when your Recovery Services vault data is set for hard or permanent delete instead of a soft delete. Use soft delete to avoid losing your Recovery Services Citrix infrastructure in the case of an accidental deletion.
-
Enable Soft Delete to protect your blob data: Notifies you when your Blob Storage data is set for hard or permanent delete instead of a soft delete. Use soft delete to avoid losing any blog storage data for Citrix applications or users in the case of an accidental deletion.
-
Enable Cross Region Restore for your Recovery Services Vault: Notifies you when your Recovery Services Vault is not enabled for cross-region restore, which means you cannot recover outside of your current region. Use to protect your Recovery Services Citrix infrastructure so it can be brought online in a different region if the primary region is inaccessible.
-
Move to production gateway SKUs from Basic gateways: Notifies you when your Gateways are using the Basic SKU which has lower performance than a Production SKU. Always use production gateway SKUs for Citrix infrastructure and users to provide the best performance and end user experience.
-
Enable Active-Active gateways for redundancy: Notifies you when your gateways are not setup for active-active fault tolerance. Always configure active-active gateways for a fault-tolerant Citrix infrastructure.
-
Implement multiple ExpressRoute circuits in your Virtual Network for cross-premises resiliency: Notifies you when your ExpressRoute circuits are not setup for high availability. Always configure ExpressRoute circuits for high-availability so your Citrix infrastructure is available to all users.
- Use ExpressRoute GlobalReach to improve your design for disaster recovery: Notifies you when your ExpressRoute circuits are not using GlobalReach. Always configure ExpressRoute circuits for Global Reach to improve your disaster recovery design and make it more resilient.
-
Repair your log alert rule: Notifies you when a log alert rule is broken. If you are using Log Alert rules for monitoring your Citrix environment, you want to enable this alert so you know when the rule is broken and not performing correctly.
-
Log alert rule was disabled: Notifies you when a log alert rule was disabled. If you are using Log Alert rules for monitoring your Citrix environment, you want to enable this alert so you know when the rule is disabled and not running at all.
Cost Alerts
-
Right-size or shutdown underutilized virtual machines: Notifies you when the machine instance type for a VM is not being fully utilized so that you can select a smaller and less-expensive VM to meet your business needs. Use this alert to reduce the costs of your Citrix infrastructure.
-
Repurpose or delete idle virtual network gateways: Notifies you when you have virtual network gateways that are idle and can be removed to reduce costs. Use this alert to reduce costs and complexity of your network infrastructure.
-
Delete ExpressRoute circuits in the provider status of Not Provisioned: Notifies you when you have ExpressRoute circuits that are not fully provisioned. Use this alert to remove incomplete ExpressRoute circuits.
-
Use Standard Storage to store Managed Disks snapshots: Notifies you when you are using more expensive storage to store managed disk snapshots. Use this alert to save money when storing disk snapshots.
Performance Alerts
-
Improve user experience and connectivity by deploying VMs closer to user’s location: Notifies you when users are accessing Citrix resources that are far away from the user. Use for data center and site location to place users close to their Citrix resources.
-
Match production Virtual Machines with Production Disks for consistent performance: Notifies you when your production VMs are not using production disks. Always use production disks for production VMs for your Citrix VMs.
-
Consider increasing the size of your VPN Gateway SKU to address high CPU: Notifies you when your VPN Gateway SKUs are not optimal for your usage. Enable this alert if you have a high number of VPN users that may be affected by VPN gateway performance when accessing Citrix resources.
-
Consider increasing the size of your VNet Gateway SKU to address consistently high CPU use: Notifies you when your VNet Gateway SKUs are not optimal for your usage. Enable this alert if you have a high number of VNet Gateways that may be affected when routing traffic between VNets for Citrix resources.
- Upgrade your ExpressRoute circuit bandwidth to accommodate your bandwidth needs: Notifies you when your ExpressRoute circuit bandwidth is not optimal for your current usage. Use this alert when you have one or more ExpressRoute circuits for your Citrix infrastructure.
- Enable Accelerated Networking to improve network performance and latency: Notifies you when VMs would benefit from the use of Accelerated Networking. Use this alert to identify which Citrix VMs need to have accelerated networking enabled.
Operational Excellence Alerts
-
Use Azure Policy to enable certain policies within the Azure environment. Here are a list of alerts that verify the Azure policy is in place:
-
Enforce ‘Add or replace a tag on resources’ in Azure Policy: used to verify that all Citrix resources are properly tagged.
-
Enforce ‘Allowed locations’ in Azure Policy: used to verify that access to the Citrix resources is restricted to particular locations to prevent malicious intent originating from untrusted locations.
-
Enforce ‘Allowed virtual machine SKUS’ in Azure Policy: used to prevent VMs from being created that fall outside the cost parameters for an environment. This policy is useful in preventing bitcoin mining with costly GPU instances.
-
Enforce ‘Inherit a tag from the resource group’ in Azure Policy: used to verify any resources in a resource group also inherit tags assigned to that resource group. This policy is useful for tracking auto-created Citrix resources.
-
-
Enable Traffic Analytics to view insights into traffic patterns across Azure resources: Notifies you when Traffic Analytics is not enabled for Azure resources. Used to secure the Citrix resources and prevent inadvertent or malicious access to data accessible through Citrix hosts.
-
Implement ExpressRoute Monitor on Network Performance Monitor for end-to-end monitoring: Notifies you when ExpressRoute circuit traffic is not being used to secure the Citrix resources. This policy helps identify and prevent accidental or malicious access to data over an ExpressRoute connection.
-
Add Azure Monitor to your virtual machine (VM) labeled as production: Notifies you when a production VM does not have Azure Monitor enabled. Used to identify any Citrix VMs not running Azure Monitor.
-
You have disks which have not been attached to a VM for more than 30 days: Notifies you when disks are not being actively used. Useful for reducing storage costs by removing unused disks.
Azure Service Health Alerts
This section provides the recommended service health alerts to configure. The list identifies the key services that are used by a Citrix deployment. Each alert has a short description that includes why this alert is important to track. These only need to be configured one time and take about 15 minutes or so to complete. We recommend subscribing to notification alerts for the following services used most often for Citrix environments running in Azure:
-
API Management: Used to manage Azure services from the Citrix Cloud.
-
Activity Logs & Alerts: Used to monitor the Citrix server logs and generate alerts.
-
Alerts & Metrics: Used to monitor the Citrix server metrics and generate alerts.
-
Azure Active Directory: Used for authentication to the Citrix servers, the Azure portal and to Citrix Workspaces.
-
Azure Monitor: Used to monitor the Citrix Resources hosted in Azure.
-
Azure Policy: Used to secure access to the Azure resources and enforce business rules across the Citrix environment.
-
Azure Private Link: Used to connect to Azure services from within the Citrix deployment.
-
Azure Sentinel: Used to monitor the security of the Citrix resources in Azure.
-
Backup: Used to back up your Citrix resources in the cloud.
-
ExpressRoute: Used to connect on-premises resources with Citrix deployment in Azure.
-
Key Vault: Used to manage the encryption keys that secure Citrix server volumes and the user data stored at rest.
-
Log Analytics: Used to monitor the logs for events that affect Citrix resources and need alerts.
-
Microsoft Azure Portal: Used to manage the Azure resources where the Citrix deployment is running.
-
Network Infrastructure: Used to monitor the communication between the Citrix resources, the on-premises data centers, and the remote users.
-
Network Watcher: Used to monitor the network traffic between Citrix and Azure resources.
-
Site Recovery: Used for providing high-availability and cross-site disaster recovery capabilities to your Citrix deployment.
-
Storage: Used to host the boot volumes for all Citrix resources in the cloud and to store user data.
-
VPN Gateway \ Virtual WAN: Used to connect users and on-premises resources with the Citrix deployment in Azure.
-
Virtual Machines: Used to host the Citrix Workloads in Azure.
-
Virtual Network: Used to communicate between the Citrix resources hosted in the Azure Cloud and remote users as well as the on-premises data centers.
While configuring these service alerts, watch for other services that should be included for your environment.
Citrix
This section covers the Citrix Tools and Services that can be used to monitor your Citrix Virtual Apps and Desktops (CVAD) deployment in Azure.
Citrix Monitor
Citrix Monitor is the recommended tool from Citrix to monitor your Citrix Cloud deployment. The tool consists of the following components:
-
Dashboard: Main display that provides a real-time overview of the environment. The Dashboard includes key metrics, such as connection and machine failures, total sessions, average logon duration, and Citrix VDA hosts status. All of the reports and charts provide drill down capabilities for identified issues.
-
Trends: Provides trend information for the following: Sessions, Failures, Logon Performance, Load Evaluation, Capacity Management, Machine usage, Resource Utilization, and Application Probes.
-
Alerts & Alert Policies: Interface to set up alerts for pre-defined Citrix alert policies.
-
Applications: Console to manage Application and Desktop probes and review the Application analytics.
Trends
Historical data is saved only for the last 90 days and is available to view through the Trends section of Citrix Monitor. The key trends to monitor for your Citrix deployment are as follows:
Connection Failures Connection failures can point to issues with particular Citrix VDA VMs or to particular users. The failed connection tab provides information on connections that fail because of following common issues: client connection errors, licensing errors, unavailable capacity, machine failures or configuration errors. The single-session and multi-session failures show servers that failed to start, hung on boot or did not register.
Logon Performance Logon performance provides an overview of how long user logons are taking and it breaks them down into the following categories:
-
Brokering Time: This is the time that it takes Citrix to broker the session between the client and the Citrix VDA host. If this time is long, the issue lies with the Citrix infrastructure, start by verifying the Cloud Connectors and any StoreFront servers have sufficient capacity.
-
VM Start time: This is the time that elapses between when the user clicks the icon to access their desktop and the time it takes to start the Virtual Machine for them. If this metric seems too long, consider increasing the buffer capacity for the delivery group.
-
HDX Connection Time: The time it takes to setup the HDX connection between the client and the Citrix VDA host. If this is metric seems slow, look at the network connections. Verify packets are not getting dropped excessively and the network bandwidth utilization is under 80%.
-
Authentication Time: The time it takes to complete the authentication for the remote session. If this time is long, research which AD Domain Controllers (DCs) are being used for authentication. Verify your sites and services are configured so the closest DCs are being used to authenticate and they have the compute capacity to handle the session load.
-
GPO Time: The time it takes to apply the group policy settings (including Citrix policies) to the session. If metric is too long, you can drill down by clicking the “Detailed Drilldown” link to view each GPO’s time. Look at the number of GPOs being applied and either consolidate the GPOs or find a third-party solution that applies GPOs synchronously instead of asynchronously.
-
Logon Scripts Time: The time it takes to execute any logon scripts before the Windows Explorer starts. If this metric is too long, investigate the Logon scripts that are being applied through GPO. Look for ways to optimize the logon scripts.
-
Profile Load Time: The time it takes to load the Windows user profile before the interactive session starts. Important to remember that if you are using Citrix Profile Management, the load time is included in this metric. If you are using another profile management solution that relies on Windows profiles, the actual profile load time is included in the Interactive Session metric. To reduce load times, you can use Citrix Profile Management with the “Large file handing” feature enabled or move to streamed profiles.
-
Interactive Session Time: The time it takes to grant the user keyboard and mouse control after the Windows profile loads. This metric includes three phases: pre-userinit, userinit, and shell. This time includes third-party profile solutions that run after the Windows profile loads and before the user is granted control of the desktop.
Resource Utilization This chart provides a view of the key metrics and a comparison of the previous 24 hours to the current metrics. This chart is useful for determining at a glance where the performance bottleneck might be when you are seeing long logon times or failed connections. If you identify trends with machines, you can use Azure Monitor to investigate further.
Citrix Policy controls Resource Monitoring and enables it by default. Citrix Policy for Process Monitoring is disabled by default because it consumes extra resources, but it provides detailed information for processes.
Alerts
Similar to Azure Alerts, Citrix alerts can be configured to email you alerts for metrics that are important to resolve quickly. Set alert policies for failures to reduce the amount of effort involved with reviewing the site metrics frequently. This frees you up to work on higher priority tasks. With the Premium licenses, you can set values at Warning and Critical levels to receive emails. When monitoring your Citrix deployment in Azure, the following alerts are recommended:
Site Policies
The Site Policies aggregate alerts across all delivery groups, users, and machines and provide warnings for site-wide events. These alerts are useful to let you know when you have any site resources falling outside the benchmark areas.
-
Connection failure rate: The percentage of connection failures over the past hour. Set an alert after carefully reviewing your baseline values for this counter. All environments have a base failure rate that occurs naturally as users attempt to connect, though 0% is the ideal value.
-
Connection failure count: The number of failed connections over the past hour. Set an alert after carefully reviewing your baseline values for this counter. All environments have a base failure count that occurs naturally as users attempt to connect, though 0 is the ideal value.
-
Failed machines (Single-session OS): The number of failed Single-session OS machines. Set an alert when this counter has a value greater than 1.
-
Failed machines (Multi-session OS): The number of failed Multi-session OS machines. Set an alert when this counter has a value greater than 1.
-
Average logon duration: The average time for a user to log on over the past hour. Citrix recommends warning when the average logon duration time exceeds 45 seconds. A better metric might be when the average logon duration exceeds 125% of your baseline logon time.
Delivery Group and Multi-session Policies
These metrics are aggregated at the Delivery Group, Multi-session, or Single-session machine level. These metrics are useful to watch when you need to focus on a particular set of resources to verify they are performing as expected. For example, when you want to monitor the user experience for the virtual desktops dedicated to executives. In those cases, you might have tighter alerting on any failure rates or average logons events.
- Connection failure rate: The percentage of connection failures over the past hour. Set an alert after carefully reviewing your baseline values for this counter. All environments have a base failure rate that occurs naturally as users attempt to connect, though 0% is the ideal value.
- Connection failure count: The number of failed connections over the past hour. Set an alert after carefully reviewing your baseline values for this counter. All environments have a base failure count that occurs naturally as users attempt to connect, though 0 is the ideal value.
-
ICA RTT (Average): Average ICA Round-Trip Time. Citrix recommends that a warning alert be set when 5 or more sessions experience an ICA RTT of 300ms or longer.
-
Average logon duration: The average time for a user to logon over the past hour. Citrix recommends warning when the average logon duration time exceeds 45 seconds. A better metric might be when the average logon duration exceeds 125% of your baseline logon time.
User Policies
The user alerts are the only alerts not aggregated across multiple resources. Since they are not aggregated, you can alert on the actual values when they fall outside the acceptable ranges.
- ICA RTT: ICA/HDX Round-Trip Time (RTT) in milliseconds (ms). Any RTT latency under 50 ms is considered ideal. Typically, the user experience starts to degrade when the RTT latency exceeds 100 ms for an extended period. The alert is triggered when ICA RTT is greater than the threshold set.
Applications
The Applications section of Citrix Monitor provides insights into the health and usage information of published desktops and applications. If the Citrix Probe Agent is installed on a machine and configured through the console, the probe results for the last 24 hours are shown. Citrix Monitor shows the probe results along with any other application analytics for faults and errors giving you a summary view of the environment’s health. The probes show the stage in the launch process where the application failed, such as authentication, enumeration, or ICA file download. This information is invaluable when troubleshooting application launch issues. Using the application monitoring allows you to pro-actively address issues before they become outages.
Citrix Analytics
Citrix Analytics is a cloud-based service that aggregates data gleaned from Citrix users across devices, networks, and applications. The sole purpose of Citrix Analytics is to identify relationships and trends that can lead to actionable insights. Analytics relies on built-in Machine Learning (ML) algorithms to find behavioral anomalies that can indicate issues with Citrix users. Citrix Analytics works with third-party providers, including Microsoft, to gather data for analysis and has these offerings:
Citrix Analytics for Security: Focuses on user and application behavior, looking primarily for insider threats or malicious behavior.
Citrix Analytics for Performance: Focuses on the user experience. The performance analytics uses data from virtual applications and desktops to generate a User Experience score from key factors that define the user experience.
Citrix Analytics integrates with the following products to provide comprehensive views:
-
Citrix Virtual Apps and Desktops
-
Citrix Application Delivery Controller (NetScaler)
-
Citrix Secure Workspace Access (Access Control)
-
Citrix Gateway
-
Citrix Content Collaboration
-
Citrix Endpoint Management
-
Citrix Secure Browser
-
Microsoft Graph Security
-
Microsoft Active Directory
Any data collected is retained for 13 months or 396 days or until 90 days after subscription termination.
Data can be integrated into any SIEM service that supports Kafka topics or Logstash-based data connectors, such as Microsoft Sentinel. Data can also be exported in a comma-separated value (CSV) format for analysis on other systems.
Citrix Analytics is accessed through your Citrix Cloud account. Once set up and configured, you gain access to dashboards that provide information and recommendations compiled by Citrix Analytics.
Dashboard | Information Provided | Citrix Analytics Service |
---|---|---|
Users | User-behavior patterns | Security |
User Access | Summary of risky domains and the volume of ingress/egress data | Security |
App Access | Summary of the domains, URL, and apps accessed by users | Security |
Share Links | Summary of the organizational share link patterns | Security |
Access Assurance Location | Summary of the logon and access details for CVAD users | Security |
Reports | Custom report creation with available metrics | Security |
User Experience | Summary of the key site performance metrics | Performance |
Infrastructure | Summary of the status and health of your site virtual machines | Performance |
Citrix Analytics-Security provides these reports, risk assessment scores and indicators for the users, share links, and IP address locations. Custom risk indicators can be created as well as custom policies to refine the conditions used for the risk assessment. You can enable a feature called Request End User Response, which immediately alerts the user when unusual activity is observed. Watchlists is another feature that allows you to monitor specific users who represent a potential threat or higher risk. You receive weekly emails from Citrix Analytics-Security with important risk indicators and users identified.
Citrix Managed Services
If you do not have the resources to monitor your Citrix infrastructure directly, you can reach out and purchase those services from the Citrix Managed Services team. When engaging the Citrix Managed Service team to monitor your Citrix infrastructure you receive the following benefits:
-
Monitoring of your Citrix infrastructure 24x7 with alerts via email or SMS.
-
Setup automated alerts with fine-tuned thresholds customized for your environment.
-
Stable and reliable environment optimized by Citrix experts remotely (no need for office space).
-
Freedom to work on other higher priority tasks.
-
Cost reduction over bringing consultants on-site.
-
Direct access to Citrix Engineering.
The Citrix Managed Service team works 100% remotely using the Citrix suite of tools to configure remote monitoring and alerting. Diagnostic data is sent to the Citrix team for processing. The Citrix team reviews the counters, logs, and events for trends or patterns that need remediation. You receive real-time alerts and monthly summary reports of important events.
Conclusion
The discussion has included the most popular tools and services available from Microsoft and Citrix to manage your Citrix deployment in Azure. Here are some general recommendations and practices to consider as you use these tools.
-
Tracking performance monitor metrics for virtual machines and the network is easier to do from the Azure Monitor. The Azure Monitor metrics are more granular than what is available within Citrix Monitor. Use Azure Monitor for the performance metrics because you have more control over the metrics collected.
-
Set your monitoring data retention to as short a period as possible for your business requirements. Most monitoring data is only useful for a short period of time. Save costs by not storing monitoring data long-term. Create an automation job to go out and clean up stale data in your storage accounts.
-
Azure includes alerts for metrics, logs, service outages, planned maintenance, monthly cost, and security. Using alerts can be a life saver. We have made a significant number of recommendations around alerts to create for your Citrix deployment. You only need to implement the ones that make the most sense in your environment. Send critical alerts via SMS and email to ensure they are acted upon quickly. Set a reminder on your calendar each quarter to go in and update the alert notification lists.
-
Monitoring and alerting on a metric comes with a monthly cost. Choose wisely which metrics to track. If you do not plan on taking action when an alert fires, then consider if the metric is still necessary to keep around.
-
Setup a custom dashboard for your Citrix resource groups and enable links to key services such as Sentinel, Service Health, Traffic Analytics, Advisor. Include on the dashboard charts that show the performance of your ExpressRoute or VPN connections, your Cloud Connectors and Citrix VDA hosts. Restrict dashboard access to only those individuals who need that information to prevent any sensitive information from inadvertently reaching unintended audiences.
-
When troubleshooting an issue, look at multiple data sources to help correlate the symptoms to the root cause. For instance, if the average logon duration is high, you can view the metrics in Azure to determine where the resources constraints exist.
-
Enabling Traffic Analytics and NSG logs is the best way to see if traffic is originating from unexpected locations. Using this information you can streamline your network communications. Use the information to create Azure policies that block inbound traffic from those unexpected locations.