The Insights panel provides information on the root causes for session failures in your environment. Drilling deeper into specific metrics with these insights helps troubleshoot and resolve session failures faster. Failure Insights specifically help administrators to improve the session availability, which is an important factor that determines user experience. These diagnostic insights are designed to aid in proactive monitoring of the user experience. Hence, Insights are displayed for a maximum duration of the 1 day even if a 1 month or 1 week time period is selected on the dashboard.
Each insights sub pane shows crucial insights about failures that have occurred on the site. A link to the failed sessions or the machines hosting them, leads to the self-service view containing the failed machines or sessions. Further drill-down is possible from here when you click a specific machine or session and see the timeline details and the detailed metrics. Top failure patterns detected with respect to the site, Delivery Group, single or multi-OS session machines is displayed. These patterns are aimed to help you spot if there is a specific cohort of users experiencing the issue. In cases where the system is unable to highlight any pattern due to a distributed cohort, it is recommended to drill down to self-analyze. Also, actions that are recommended to be taken to troubleshoot and resolve the issues are shown.
Black hole machines
Some machines in your environment though registered and appearing healthy might not service sessions brokered to them, resulting in failures. Machines that have failed to service four or more consecutive session requests are termed as Black hole machines. The reasons for these failures are related to various factors that might affect the machine, such as, insufficient RDS licenses, intermittent networking issues, or instantaneous load on the machine. These failures do not include failures due to capacity or license availability. The presence of black hole machines in the environment increases session failures resulting in poor session availability. The Black hole machines insights show the number of black hole machines identified in your environment during the selected time period.
Clicking View machines opens the Machines based self-service view that is filtered to show all the black hole machines in your environment during the selected time period. Here, you can analyze the individual performance metrics of the machine to identify and understand possible reasons for the machine not accepting session requests. For more information about the performance indicators available in the Machines based self-service view, see Self-service search for Machines. Further, clicking the machine name opens the Machine Statistics view that helps correlate the resource performance parameters of the machine with the session performance parameters during the same time period. For more information see the Machine Statistics view article.
Recommended steps to help reduce the number of black holes are provided,
- to check the RDS license status,
- to put the machine in maintenance mode, or
- to reboot the machine.
For more information about Black Hole Machine Alerts, see the Alerts article.
The Communication Errors subpane lists the number of session failures due to communication errors between the endpoint (where the user launches the session) and the machine. These errors can occur due to incorrect firewall configurations or other errors on the network path.
The two categories of communication errors are:
- Endpoint to machine—lists the sessions where communication errors have occurred between the endpoint and the machine.
- Gateway to machine—lists the sessions where communication errors have occurred between the gateway and the machine.
Additionally, the Communication Error subpane displays the following recommendations to resolve the errors.
- Check the firewall settings on the machine and gateway.
- Check network connectivity between the machine and gateway.
Clicking the failure number opens the sessions based self-service view that is filtered to show all the sessions that have failed due to communication errors in your environment during the selected time period. This view helps analyze the individual sessions that have failed and get a possible root cause. For more information about the indicators available in the sessions based self-service view, see Self-service search for sessions.
The Zombie Sessions subpane shows information on session failures that have occurred due to zombie sessions in the environment. A zombie session is an abandoned session on a single-session OS machine resulting in new session launches on the machine to fail. Attempts to launch sessions on this machine fails with an Unavailable Capacity error. All future session launch attempts fail until the abandoned session is terminated. Zombie Sessions insights aim to help in spotting these machines with abandoned sessions, thus enabling to proactively mitigate these failures.
Click View machines to go to the Self-service view filtered with the list of machines containing Zombie Sessions.
Here, Failure Count represents the number of session failures that have occurred in the selected interval. The Last Failure Type and Reason help root cause reasons for machines containing zombie sessions.
A Zombie session alert mail is generated when a new machine with a zombie session is detected in the environment in a 15 mins interval. For more information, see the [Alert for Machines with Zombie Sessions] Self-service search for sessions article.
Recommended actions for Zombie Sessions
You can either log the users off or reboot the machines containing Zombie sessions.
You can log the users out of the zombie sessions using Monitor for Citrix DaaS sites. For more information, see the Site Analytics article.
You can reboot the machines containing zombie sessions from Performance Analytics, see the Machine actions article.
Overloaded Machines Insight gives visibility into overloaded resources causing poor experience. Machines that have experienced sustained CPU spikes, or high memory usage, or both, that have lasted for 5 minutes or more, resulting in a poor user experience in the selected duration are considered to be overloaded. There might be other machines in the environment with high resource usage but not impacting the User Experience. These machines are not categorized as overloaded machines.
The Overloaded Machines Insight shows the number of overloaded machines and the number of users affected in the selected duration.
Click View Machines to see the overloaded machines listed in the Machines self-service page for Overloaded Machines. Overloaded machines are listed with the number of Sustained Memory and CPU Spikes that have occurred on these machines during the selected interval.
The timeline graph shows the number of machines that have been overloaded over the selected time interval plotted at a 15-minute interval. You can further click a specific machine to see the Machine Statistics view.
The Patterns Detected section shows the top three patterns noticed in overloaded machines with respect to the following criteria:
- Number of overloaded machines in each Delivery Group
- Number of overloaded machines running single-session or multi-session OS
- Number of overloaded machines with Sustained Memory or CPU spikes
For more information about Overloaded Machine Alerts, see the Alerts article.