Insights

The Insights panel provides information regarding session failures or slowness in your environment. Drilling deeper into specific metrics with these insights helps troubleshoot and resolve session failures or slowness faster. Failure Insights specifically help administrators to improve the session availability, which is an important factor that determines user experience. These insights are designed to aid in proactive monitoring of the user experience. Hence, Insights are displayed for the current failures in the system (refreshed every 15 minutes).

Director Insights

Clicking the insights icon from the Dashboard displays the insights pane with details about the insights and options to drill down to the Machines or Connections view. Admin can also navigate to alert configuration from the panel.

Blackhole Machines, Zombie Sessions, Overloaded Machines, and Session failure Insights are available on this panel. Each insight upon expansion displays a link to the failed sessions or the machines hosting them. This leads to the Filters view containing the failed machines or sessions. Further drill-down is possible from here when you click on a specific machine or session to see the detailed metrics.

The expanded view for each insight shows top failure patterns detected with respect to the site, Delivery Group, single or multi-OS session machines. These patterns are aimed at helping administrators spot if there is a specific cohort of users experiencing the issue. In cases where the system is unable to highlight any patterns due to a distributed cohort, it is recommended to drill down to self-analyze. Also, actions that are recommended to be taken to troubleshoot and resolve the issues are shown.

Black hole machines

Some machines in your environment though registered and appearing healthy might not service sessions brokered to them, resulting in failures. Machines that have failed to service four or more consecutive session requests are termed as Black hole machines. The reasons for these failures are related to various factors that might affect the machine, such as insufficient RDS licenses, intermittent networking issues, or instantaneous load on the machine. These failures do not include failures due to capacity or license availability. The presence of black hole machines in the environment increases session failures, resulting in poor session availability. The Black hole machines insights show the number of black hole machines identified in your environment.

Black hole

Recommended actions for Black hole machines:

  • Check RDS license on the machine
  • Put the machine in maintenance mode
  • Reboot

The Patterns Detected section shows the top patterns noticed in black hole machines with respect to the following criteria:

  • Number of black hole machines running single-session or multi-session OS
  • Delivery Group with highest number of impacted machines

Zombie Sessions

The Zombie Sessions sub-pane shows information on session failures that have occurred due to zombie sessions in the environment. A zombie session is an abandoned session on a single-session OS machine resulting in new session launches on the machine to fail. Attempts to launch sessions on this machine fail with an “Unavailable Capacity” error. All future session launch attempts fail until the abandoned session is terminated. Zombie session insights aim to help spot these machines with abandoned sessions and to proactively mitigate these failures.

Zombie Sessions

Recommended actions for Zombie Sessions:

  • You can log the users out of the Zombie sessions using Monitor for Citrix DaaS sites.
  • You can reboot the machines containing Zombie sessions.

The Patterns detected section shows the top patterns noticed in Zombie sessions with respect to the following criteria:

  • Delivery Group with highest number of impacted machines
  • Delivery Group with highest number of impacted sessions

Overloaded Machines

Overloaded Machines Insight gives visibility into overloaded resources causing poor experience. Machines that have experienced sustained CPU spikes, or high memory usage, or both, that have lasted for 5 minutes or more, that may result in a poor user experience are considered to be overloaded.

Overload Machines

The Patterns Detected section shows the top patterns noticed in overloaded machines with respect to the following criteria:

  • Number of overloaded machines running single-session or multi-session OS
  • Delivery Group with highest number of impacted machines
  • Number of overloaded machines with Sustained Memory or CPU spikes

Connection failures

The Connection Failures insight lists the number of session failures due to issues from the endpoint (where the user launches the session) until the machine (where the session is brokered). These failures can occur due to a variety of reasons, from incorrect firewall configurations, network communication issues, or machine unavailability.

Connection Failures

The two categories of connection failure categories are:

  • Client connection failures —lists the sessions where communication errors have occurred on the endpoint.
  • Machine failures —lists the sessions where errors have occurred at the machine.

Additionally, the Connection failures subpanel displays the following recommendations to resolve the errors.

  • Check the firewall settings on the machine and the gateway.
  • Check network connectivity between the components.
  • Ensure machines are powered on, and in registered states.

The failures are grouped to identify blocked users, i.e. users who have not had any successful sessions after their connection failure in the selected time duration. The patterns are highlighted for both categories of failures. Clicking the details opens the Connections view that is filtered to show all the sessions that have failed due to the errors in your environment during the selected time. This view helps analyze the individual sessions that have failed and get a possible root cause.

Filter View

Following are new filters and columns that are added to Machines view under Filters -> Machines:

  • Consecutive Failures Count: Number of consecutive session launch failures machine has reported
  • Is Blackhole: Whether the machine has been identified as a black hole machine
  • Is Zombie: Whether the machine has been identified as a zombie machine
  • Is Overloaded: Whether the machine has been identified as an overloaded machine

Filters All Machines

Following are new filters and columns that are added to Connections view under Filters -> Connections:

Failed Connections

Alerts

There are 3 new “Advanced Alert Policies” that are available and enabled by default to help admins by proactively raising alerts in case Black hole, Zombie or Overloaded machines (across all delivery groups) are identified in the system

Note:

Connection failures are not supported under Advanced Alert Policies in the current release.

Director Alerts

Managing Default Policies

To maximize the value of Diagnostic Insights, here are a few key points regarding the default alert policies:

  • Customization: Administrators can customize the alert policy parameters, scope, or notification actions for these default policies.
  • Restrictions: You cannot change the conditions or delete the default policies.
  • Enabling/Disabling: Policies can be disabled or enabled as needed.
  • Notifications: The default policies do not have notifications configured out of the box; they will only result in UI alerts. We strongly advise administrators to update notification preferences on these policies.

Note:

  • The Insights Panel itself relies on enabled policies. If you disable the default policies and have no enabled custom policies, the Insights Panel will stop refreshing.

  • Updating alert parameters also alter the calculation of the corresponding insight on the dashboard.

Creating Custom Policies

You can create additional custom policies for these insights. If you create a custom policy, we recommend modifying the scope of the default policy to exclude any Delivery Groups covered by your custom policy. This prevents overlap and potential duplicate alerts.

Alert notifications

Following is a quick view of the ‘Citrix Alerts’ raised for the insights:

Alert notifications

Alert for Black Hole Machines

Monitor scans for Black hole machines every 15 minutes and sends an alert to administrators to proactively mitigate session failures faced by users. Machines that have failed to service four or more consecutive session requests are termed as Black hole machines by default. The alert conditions and re-alert intervals can be customized for the selected alert.

Advanced alert policy

Details of the machines that caused session failures are sent in the alert emails or webhook payload. The Black hole machines alert policy must be enabled to receive these notifications.

Alert for Machines with Zombie Sessions

The Machines with Zombie Sessions alert is generated when a new machine with a zombie session is detected in the environment in a 15 mins interval.

Admins can customize the alert conditions for the Machines with Zombie Sessions alert.

Advanced alert zombie sessions

Details of the machines that caused the Zombie sessions and failures are sent in the alert emails or webhook payload.

Alert for Overloaded Machines

Machines that have experienced sustained CPU spikes, or high memory usage, or both for configured sampling intervals are considered to be overloaded.

Admins can customize the alert conditions and the re-alert preference for the Overloaded Machines alert.

Overloaded Machines

Re-alerts

For the specific insight alerts, the notification behavior within the configured Re-Alert interval is incremental. If new problematic machines are identified during this interval, the re-alert email or webhook will only contain those new machines. A full alert, listing all problematic machines for a given Delivery Group, will be sent once every Re-Alert interval (which is 24 hours by default).

Insights