Monitor

Like any integrated system, monitoring and maintenance is critical to the overall health of the solution. Without proper support, operations and health monitoring systems in place, the user experience will slowly start to degrade.

Process 1: Support

When problems arise, technical support is the first point of contact. This section addresses the proper staffing, organization, training, delegated administration and tools that should be used to maintain the Citrix deployment.

Decision: Support Structure

Multiple levels of support have been found to be the most effective ways of addressing support issues. Low criticality, low complexity or frequently occurring issues should be managed and resolved at the lower support levels. High criticality and complex issues are escalated to more experienced architects or infrastructure owners. The diagram below outlines a common multi-level support structure.

Monitor support structure image

If a user encounters an issue, Level-1 support (help desk) is the entry point to the support system. Level-1 should resolve 75% of all issues encountered, of which a majority will be routine problems that only require a limited knowledge of the Citrix environment. At this level, issues are quickly resolved and some may be automated (self-service), for example password resets and resource provisioning.

Non-routine problems that exceed Level-1’s abilities are escalated to Level-2 (Operators). This support level is generally comprised of administrators supporting the production Citrix environment. Information on the end user’s problem and attempted troubleshooting steps are documented at the first level allowing Level-2 technicians to immediately begin addressing the problem. Level-2 technicians should handle only 20% of the support tickets and are highly knowledgeable on the Citrix environment.

Complex issues that exceed Level-2’s abilities should be escalated to Level-3 (Implementers). Level-2 and Level-3 support may often both be members of the Citrix Support Team, with Level-3 comprising the senior staff maintaining the Citrix environment. Level-3 issues are complicated and often mission critical requiring expert knowledge of the virtual desktop and application environment. Level-3 support tickets should amount to no more than 5% of all support issues.

The final level, Level-4 (Architects), is focused on the strategic improvements for the solution, testing new technologies, planning migrations, and other high level changes. Generally, Level-4 is not involved in active support of a production environment.

Should support discover an issue that is related to an application or underlying infrastructure, the ticket is handed to the appropriate team for troubleshooting. If a program bug is discovered, the issue is then re-escalated and a ticket is established with the appropriate vendor.

Decision: Support Responsibilities and Skill Set

The following table highlights the recommended characteristics of each support level.

Support level

Level 1 - Help desk

Description

Provide first-line support of reported issues. Initially, servicing support messages and phone calls. This level needs to perform initial issue analysis, problem definition, ticket routing, and simple issue resolution. Often processes requests for application access or support with configuring plugins.

Responsibilities

  • Perform issue definition, initial analysis and basic issue resolution
  • Perform initial troubleshooting to determine the nature of the issue
  • Create ticket, collect user information, and log all troubleshooting steps performed
  • Resolve basic Citrix related issues, connectivity problems and application related issues using existing knowledge base articles
  • Escalate issue to Level-2 if advanced skills or elevated permissions are required
  • Ability to isolate the issue to be Citrix related, Microsoft related or third party Application related
  • If it affects the production environment or is potentially causing a system level outage, escalate directly to Level-3
  • Generate requests for additional issue resolution guides as necessary
  • Follow up with end users when a support ticket is closed to ensure the problem has been satisfactorily resolved

Skill set

  • General Citrix XenApp/XenDesktop knowledge (CCA, CCA-V)
  • General Windows client OS/server OS knowledge (MCP)
  • General Active Directory knowledge
  • General Networking knowledge (CCNA)

Support level

Level-2 (Operators)

Description

Primarily supporting day-to-day operations of the Citrix environment; may include proactive monitoring and management. In addition, this role should also perform intermediate level troubleshooting and utilize available monitoring or troubleshooting tools. Assist with resolving issues escalated by Level-1 support.

Responsibilities

  • Perform intermediate issue analysis and resolution.
  • Identify root cause of issues.
  • Respond to server alerts and system outages.
  • Create weekly report on number of issues, close rate, open issues, etc.
  • Review vendor knowledge base articles.
  • Respond to out-of-hours helpdesk calls.
  • Respond to critical monitoring alerts.
  • Generate internal knowledge base articles and issue resolution scripts and maintain Level-1 troubleshooting workflows.
  • Perform basic server maintenance and operational procedures.
  • Manage user profiles and data.
  • Escalate ticket to Level-3 or appropriate technology owner if advanced skills or elevated permissions are required.
  • Generate requests for additional issue resolution scripts and knowledge base articles as necessary.
  • Able to read built-in event logs for Windows and Citrix to do basic troubleshooting following public information via Google/Bing.

Skill set

Experience with Microsoft Windows Server including but not limited to:

  • Configuring operating system options
  • Understanding Remote Desktop Services policies and profiles
  • Using Active Directory
  • Creating users/managing permissions and administrator rights
  • Creating and modifying Active Directory group policies

Basic administration skills, including:

  • An understanding of protocols (TCP)
  • An understanding of firewall concepts
  • An understanding of email administration and account creation
  • An understanding of Remote Desktop Services policies and profiles
  • The ability to create shares and give access to shared folders/files

Experience performing the following:

  • Managing, maintaining, monitoring and troubleshooting Citrix solutions
  • Backing up components in Citrix environments
  • Updating components in Citrix environments
  • Creating reports for trend analysis

Support level

Level-3 (Implementer)

Description

Central point for implementing, administering and maintaining Citrix desktop and application virtualization infrastructure. This role focuses on deploying new use cases and leading lifecycle management initiatives. Generally, one Implementer could focus on one use-case at a time. For example, three new concurrent use cases would require three Implementers. Escalates issues to software vendor specific technical support and notifies Level-4 about this issue.

Responsibilities

  • Perform advanced issue analysis and resolution.
  • Perform maintenance and environment upgrades.
  • Addresses high severity issues and service outages.
  • Manage the Citrix environment.
  • Oversee and lead administrative tasks performed by Level-2.
  • Manage network and storage infrastructure as it relates to the Citrix environment (depending on size of company or Citrix environment).
  • Review periodic reports of server health, resource usage, user experience, and overall environment performance.
  • Review vendor knowledge base articles and newly released updates.
  • Perform policy-level changes and make Active Directory updates.
  • Review change control requests that impact the Citrix environment.
  • Perform advanced server and infrastructure maintenance.
  • Review knowledge base articles and issue resolution scripts for accuracy, compliance, and feasibility.
  • Create knowledge base articles and issue resolution scripts to address Level-2 requests.
  • Escalate ticket to vendor specific technical support, when necessary, and notify Level-4 of the issue.

Skill set

Knowledge of how the following Windows components integrate with Citrix technologies:

  • Active Directory Domain Services
  • Active Directory Certificate Services
  • Policies
  • Domain Name System (DNS)
  • Dynamic Host Configuration Protocol (DHCP)
  • Group Policy Objects (GPOs)
  • NTFS Permissions
  • Authentication and Authorization
  • Knowledge of IIS
  • Microsoft Windows Operating Systems: Windows 10, Windows 8.1, Windows 7 Windows Server 2012 R2 Windows Server 2008 R2
  • Roles and features of Windows Server
  • Knowledge of SQL 2008 R2 and newer
  • Knowledge of SQL Clustering, Mirroring and AlwaysOn Availability Groups.
  • General networking skills (i.e. routing, switching)
  • Knowledge of hypervisors.
  • Knowledge of shared storage configuration and management.

Support level

Level-4 (Architect)

Description

The Level-4 team has minimal exposure to administrative tasks but focuses on scoping, planning and executing Citrix-specific service and project requests. An architect translates business requirements into a technical design.

Responsibilities

  • Provide technical leadership for upcoming projects.
  • Lead design updates and architecture revisions.
  • Address high severity issues and service outages.
  • Oversee technology integration workflows.
  • Review periodic reports of server health, resource usages, user experience, and overall environment performance to determine next steps and upgrade paths.
  • Initiate load testings to determine capacity of environment.
  • Review frequently recurring helpdesk issues.
  • Ensure technical specifications continue to meet business needs.
  • Update design documentation.

Skill set

Advanced architectural assessment and design skills for:

  • Citrix XenApp
  • Citrix XenDesktop
  • Citrix XenServer, VMware vSphere, Microsoft Hyper-V
  • Citrix Provisioning Services
  • Citrix NetScaler
  • Citrix StoreFront
  • Active Directory
  • Storage solutions
  • Networking
  • Application delivery
  • Disaster recovery
  • Policies/policy structures and security restrictions
  • Licensing
  • Methodology

Intermediate knowledge of:

  • General networking skills
  • Change control process
  • Project management
  • Risk assessment

Support level

Vendor support

Description

Vendor assistance may be necessary should defects in a program be discovered. At this stage, Level-3 engineers need to establish a support ticket with the appropriate vendor to assist with finding a solution.

Support level

Self-service

Description

A self-service portal should be utilized for noncritical tasks, such as application access, permissions, password resets, etc. The portal can range from a simple FAQ page to a fully automated process requiring no human interaction. The purpose of the self-service portal is to add an additional touch point for end users to address basic issues, preventing the creation of new support tickets.

Decision: Certifications and Training

The following table details the recommended training, certifications and experience for each support level.

Role Recommended training Recommended course(s) Recommended certification Relevant experience
Help Desk (Level 1) Level-1 support personnel should be provided with basic training on Citrix XenApp, Citrix XenDesktop and supporting technologies. This can include internal training from subject matter experts or from a Citrix Authorized Learning Center. The training provided should focus on the following topics: High level overview of the XenApp and XenDesktop implementation. Using Citrix Director to manage user sessions. Troubleshooting Citrix XenApp and XenDesktop sessions. Troubleshooting methodology. In addition, regular training should be provided to the Tier-1 team members on the latest troubleshooting recommendations from the Level-2 and Level-3 teams as well as details on any relevant changes to the environment. This will help to ensure a good baseline knowledge level across the team and consistent customer service. CXD-105: Citrix XenApp and XenDesktop Help Desk Support N/A 1+ years (Entry level also acceptable)
Operator (Level 2) Level-2 personnel should conduct regular team training sessions to refine administrative skills and ensure a baseline knowledge level across the team. Formalized trainings are also essential when there are architectural updates to the environment and the Level-2 team is working with unfamiliar technologies. All members of the Level-2 team should achieve the Citrix Certified Associate (CCA) certification for Citrix XenApp and XenDesktop. Advanced training on Windows concepts will also be essential for Level-2 team members who do not have desktop or server support experience. Finally, on-the-job training along with close integration with Level-3 administrators is essential as the Level-2 roles are formalized and responsibilities are handed over from Level-3 to Level-2. CXD-210 XenApp and XenDesktop 7.1x Administration Citrix Certified Associate - Virtualization 2-3 years
Implementer (Level-3) Level-3 support team members hold a minimum of three years of enterprise experience implementing and supporting XenApp, XenDesktop, Provisioning Services and Windows operating systems. Level-3 staff should also complete the Citrix Certified Professional (CCP) certification track as this will prepare them to proactively manage the user community and implement Citrix solutions according to Citrix leading practices. CXD-400: Designing App and Desktop Solutions with Citrix XenApp and XenDesktop after completion of level-3 CXD-310 Citrix Certified Expert - Virtualization 3-4 years
Architect (Level 4) Experience is essential for Level-4 staff. A qualified Level-4 resource should have a minimum of five of experience implementing, supporting, and serving in a technology architect role for a XenApp and/or XenDesktop environment as well as additional administrative experience with integrated technologies such as application and profile management solutions. The ideal candidate will have served in such a capacity at two or more environments for purposes of product exposure and in at least one environment of over 1,200 concurrent users. A Citrix Certified Expert (CCE) certification or comparable training and experience should be a prerequisite of the role. CXD-400: Designing App and Desktop Solutions with Citrix XenApp and XenDesktop after completion of level-3 CXD-310 Citrix Certified Expert - Virtualization 5+ years

Decision: Support Staffing

The following table provides guidance on the recommended number of support staff.

Role Small Environment: Sites: 1; Users: <500; Images: 1-2 Mid-size Environment: Sites: 1-2; Users: 1000-5000; Images: 3-5 Large Environment: Sites: 2+; Users: >5000; Images: 5+
Help Desk (Level-1) 3 5-10 15-20
Operator (Level-2) 1-2 2-3 4-5
Implementer (Level-3) 1 1-2 2-3
Architect (Level-4) 1 1 1-2

Note

This table should only be used as a baseline. Support staffing decisions should be evaluated against the defined requirements, projected workloads, and operational procedures of an organization. Multiple levels can be combined, for example there may be insufficient design projects to have a dedicated architect role or a more senior member of the Citrix team can act as an Operator and Implementer.

Decision: Job Aids

General Support Tools: The following table details tools that should be made available to all support levels.

Tools Details
Ticket Management System Used to document customer information and issues. A typical ticket management system provides the following functionality: Monitoring the queue of tickets. Setting a limit on the number of open tickets. Establishing thresholds such as how long a certain type of ticket should take to be answered. Identifying a group of users or individuals who require higher priority assistance. Informing the user when their ticket is open, updated, or closed. Provide an internal knowledge base for the support professionals to search for known resolved issues.
Call Scripts The first contact help desk personnel should have documented scripts to ensure that all relevant data is captured while the user is on the phone. This practice also assists in proper triage and allows the next support level to perform research prior to customer contact. A sample call script is provided for reference.
Remote Assistance Tools Remote assistance tools are useful when troubleshooting user issues. Support technicians and administrators can remotely observe a user’s actions.
Knowledge Base Documentation should be created and maintained in a knowledge base or library of known issues. Articles should be searchable for quick recovery. Knowledge bases help support staff to quickly resolve known issues and reduce the need to perform time consuming research.

Citrix Support Tools

The following table provides recommendations on the Citrix support tools that should be made available to each support level.

Tool

Citrix Director

Description

Citrix Director provides an overview of hosted desktops and application sessions. It enables support teams to monitor and troubleshoot issues.

Products

XenDesktop XenApp Provisioning Services Console XenServer
X X    

Support Level

L1 L2 L3 L4
X X X X

Tool

Citrix Studio

Description

Citrix Studio enables administrators to perform configuration as well as maintenance tasks for a XenApp and XenDesktop site and associated virtual desktops or hosted applications.

Products

XenDesktop XenApp Provisioning Services Console XenServer
X X    

Support Level

L1 L2 L3 L4
X X X X

Tool

Citrix Insights Services

Description

Run from a single Citrix Delivery Controller to capture key data points and CDF traces for selected computers followed by a secure and reliable upload of the data package to Citrix Technical Support for escalation.

Products

XenDesktop XenApp Provisioning Services Console XenServer
X X X X

Support Level

L1 L2 L3 L4
    X X

Tool

Provisioning Services Console

Description

The Provisioning Services Console enables administrators to perform configuration and maintenance tasks for a Provisioning Services farm.

Products

XenDesktop XenApp Provisioning Services Console XenServer
    X  

Support Level

L1 L2 L3 L4
    X X

HDX Monitor is a tool to validate the operation of the Citrix ICA/HDX stack of a user session. HDX Monitor provides information about client capabilities, network performance/activity, session settings and many more items.

Tool

XenCenter

Description

XenCenter enables administrators to perform configuration and maintenance tasks for a XenServer Resource Pool.

Products

XenDesktop XenApp Provisioning Services Console XenServer
      X

Support Level

L1 L2 L3 L4
    X X

HDX Monitor is a tool to validate the operation of the Citrix ICA/HDX stack of a user session. HDX Monitor provides information about client capabilities, network performance/activity, session settings and many more items.

Citrix Insight Services

Administrators can utilize Citrix Insight Services to simplify the support and troubleshooting of the Citrix environment. Citrix Insight Services is run locally to collect environment information. Online analysis capabilities analyze that information and provide administrators recommendations based on their Citrix environment and configuration. Additional information regarding Citrix Insight Services can be referenced in the Citrix Support article CTX131233 - FAQ: Citrix Insight Services.

A full list of the available tools provided by Citrix Support to assist with troubleshooting can be referenced in Citrix Supportability Pack.

Call Script

The following call script can be used as an initial baseline for a Citrix Help Desk team. Citrix Consulting recommends reviewing this sample call guide and adding any environment specific details that may need to be collected.

  1. What is the name and location of the user? This question will identify if the user is accessing the environment from an external or internal network location.
  2. Is the problem always reproducible? If it is a Yes, get the exact reproduce steps. This question is very important for the support team to troubleshooting an issue.
  3. Do any other users at the site/location experience the same issue? Can they have a colleague logon from same and/or different workstation? These questions help to determine if this is a workstation issue or a user issue.
  4. What type of endpoint device is the user utilizing? (Corporate device, BYOD, thin client, pc, laptop, etc.) This question will help determine if the issue is related to the user’s endpoint.
  5. What is the Citrix Receiver version and connection information? This question will verify if the user is using the right version of Receiver (the latest Receiver version or the version standardized by the company).
  6. Can the user see the StoreFront authentication page? This question helps to identify network issues.
  7. What is the name of the application (or virtual desktop) the user is attempting to use? Does the user see the appropriate application or desktop icon on the StoreFront site? These questions help to determine if there is an issue with user access and/or group membership.
  8. Does the application (or desktop) launch when the icon is selected? Does the application logon screen appear (if applicable)? These questions help to determine if a connection is made into the Citrix XenDesktop infrastructure.
  9. Can the user authenticate into the application (if applicable)? Does the issue occur after the application is launched? This question helps to determine if the issue is with the application rather than the application delivery infrastructure.
  10. What is the specific error seen (if applicable)? This question identifies the specific error. The user should be requested to provide a screenshot, if available.

Decision: Delegated Administration

Each support level must be provided with sufficient rights to effectively perform their role. The following tables provide guidance on the recommended privileges per support level.

XenApp/XenDesktop Delegated Rights

Administrator role Support Level
Help Desk Administrator Level-1
Full Administrator Level-2
Full Administrator Level-3
Full Administrator Level-4

For further information about delegated rights within a XenApp/XenDesktop Site, please refer to Citrix Product Documentation - XenApp and XenDesktop Delegated Administration.

Provisioning Services Delegated Rights

Administrator Role Support Level
N/A Level-1
Site Administrator Level-2
Farm Administrator Level-3
Full Administrator Level-4

For further information about delegated rights within a Provisioning Services Site, please refer to Citrix eDocs - Provisioning Services Managing Administrative Roles.

StoreFront Delegated Rights

Administrator Role Support Level
N/A Level-1
N/A Level-2
Local Administrator on StoreFront Server Level-3
Full Administrator Level-4

Users with local administrator rights have access to view and manage all objects within StoreFront or Web Interface. These users can create new sites and modify existing ones.

Citrix License Server Delegated Rights

Administrator Role Support Level
N/A Level-1
N/A Level-2
Administrator Level-3
Administrator Level-4

By default, the account used during the installation of the license server becomes the administrator for the console. Often the accounts used for the installation are not the intended accounts for the regular administration tasks. For the steps on how to change the default administrator, please reference CTX135841 - How to Change the Default Administrator for the Citrix Licensing Server Version 11.10. All users created through this process are full administrators of the Citrix License Server.

XenServer Delegated Rights

Administrator Role Support Level
N/A Level-1
Virtual Machine Operator Level-2
Pool Administrator Level-3
Full Administrator Level-4

For further information about delegated rights within a XenServer Resource Pool, please refer to XenServer 7.0 Administrators Guide (see chapter Role Based Access Control).

Process 2: Operations

This section defines routine operations for the Citrix environment that help to improve stability and performance.

Decision: Administrative Tasks

The Citrix Support Team should perform regular operations and maintenance tasks to ensure a stable, scalable Citrix environment. Each operation is categorized by the associated component of the solution as well as the frequency of the operation (ongoing, daily, weekly and yearly). Tasks have been aligned to the roles described within Decision: Support Responsibilities and Skill Set.

If the administrators performing operations are the same the support team, then the designations are linked as follows:

  • Level 2 Support = Operators
  • Level 3 Support = Implementers

Daily Periodic Tasks

The following table outlines the tasks that should be performed by the Citrix Support Team on a daily basis.

Component Task Description Responsible
Generic Review Citrix Director, Windows Performance Monitor, Event Log, and other monitoring software alerts Check for warnings or alerts within Citrix Director, event logs, or other monitoring software. Investigate the root cause of the alert if any. Note: A computer and monitor can be set up to display the Citrix Director dashboard to create a Heads Up Display for the Citrix department. This ensures the status of the environment is clearly visible. Monitoring recommendations for XenDesktop and XenApp 7.x are included in the Monitoring section of the VDI Handbook. Operators
Generic Verify backups completed successfully. Verify all scheduled backups have been completed successfully. This can include but is not limited to: User data (user profiles/home folders); Application data; Citrix databases; StoreFront configuration; Web Interface configuration; Provisioning Services vDisks (virtual desktops and application servers);XenServer VM/Pool metadata (or equivalent for other hypervisors); Dedicated virtual desktops;License files. Operators
Generic Test environment access Simulate a connection both internally and externally to ensure desktop and application resources are available before most users log on for the day. This should be tested throughout the day and may even be automated. Operators
XenApp and XenDesktop Virtual machine power checking Verify that the appropriate number of idle desktops and application servers are powered on and registered with the Delivery Controllers to ensure availability for user workloads. Operators
XenApp and XenDesktop Perform incremental backup of Citrix related databases Perform incremental-data backups of the following Citrix databases: Site Database; Configuration Logging Database; Monitoring Database. Operators, Database team (if Citrix environment is using a shared SQL)
Provisioning Services Check Citrix Provisioning Server utilization Check the number of target devices connected to the Citrix Provisioning Servers and balance the load across servers, if required. Operators
Provisioning Services Perform incremental backup of Citrix PVS database Incremental backup of Citrix Provisioning Server database hosted on SQL Server infrastructure. Operators, Database team (if Citrix environment is using a shared SQL)

Weekly Periodic Tasks

The following table outlines the tasks that should be performed by the Citrix Support Team on a weekly basis.

Component Task Description Responsible
Generic Review latest hotfixes and patches Review, test, and deploy the latest Citrix hotfixes and ascertain whether the Delivery Controllers and Server-Based OS / Desktop-Based OS virtual machines require them.Note: Any required hotfixes should be tested using the recommended testing process prior to implementation in production. Operators, Implementers (review process)
Generic Create Citrix environment status report Create report on overall environment performance (server health, resource usage, user experience) and number of Citrix issues (close rate, open issues, and so on). Operators
Generic Review status report Review Citrix status report to identify any trends or common issues. Implementers, Architect
Generic Maintain internal support knowledge base Create knowledge base articles and issue resolution scripts to address Level-1 and Level-2 support requests. Review knowledge base articles and issue resolution scripts for accuracy, compliance, and feasibility. Operators (Level-2 requests), Implementers (Level-3 requests, and review process)
XenApp and XenDesktop Check Configuration Logging report Confirm that Citrix site-wide changes implemented during the previous week were approved through change control. Auditors
XenApp and XenDesktop Perform full backup of Citrix related databases Perform full-data backups of the following Citrix databases: Site Database;Configuration Logging Database;Monitoring Database. Operators, Database team (if Citrix environment is using a shared SQL)  
Provisioning Services Check storage capacity (only prior to updating a vDisk) Review storage utilization, used and free storage space, for vDisk store and each vDisk. Note: Lack of space within the vDisk repository will be an issue only when the vDisks are updated using versioning or when a vDisk is placed in private mode during an update procedure. Storage utilization within vDisk should also be investigated. For example, a 20GB vDisk may only have 200MB of free storage. If the vDisk itself is limited for storage, then it needs to be extended. Citrix does not support resizing of a VHD file. Refer to the Microsoft link Resize-VHD for information on resizing a VHD file. Operators
Provisioning Services Perform vDisk updates (as necessary) Perform a full backup of the vDisk before implementing any updates. Update the master vDisk image files and apply the following: Windows software updates and patches; Operating system and application changes; Anti-virus pattern and definitions updates. Note: Updates should be tested using the recommended testing process prior to implementation in production. Auditors
Provisioning Services Check auditing reports Review the Citrix Provisioning Services auditing Logs. Note: Provisioning Server auditing is off by default and can be enabled to record configuration actions on components within the Provisioning Services farm. To enable auditing refer to the Citrix production documentation article, Enabling Auditing Information. Operators, Database team (if Citrix environment is using a shared SQL)
Provisioning Services Perform full backup of Citrix PVS database Backup of Citrix Provisioning Server database hosted on SQL Server infrastructure. Database team (if Citrix environment is using a shared SQL)

Monthly Periodic Tasks

The following table outlines the tasks that should be performed by the Citrix Support Team on a monthly basis.

Component Task Description Responsible
Generic Perform capacity assessment Perform capacity assessment of the Citrix environment to determine environment utilization and any scalability requirements. Note: Recommendations for performing a capacity assessment are included in Decision: Capacity Management in the Monitoring section below. Architect

Yearly Periodic Tasks

The following table outlines the tasks that should be performed by the Citrix Support Team on a yearly basis.

Component Task Description Responsible
Generic Conduct Citrix policy assessment Review Citrix policies and determine whether new policies are required and existing policies need to be updated. Implementers
Generic Review software upgrades Review and assess the requirement for new Citrix software releases or versions. Implementers
Generic Business Continuity Plan (BCP)/ Disaster Recovery (DR) test Conduct functional BCP/DR test to confirm DR readiness. This plan should include a yearly restore test to validate the actual restore process from backup data is functioning correctly. Architect
Generic Perform application assessment Review the usage of applications outside and within the Citrix environment. Assess the validity of adding additional applications to the Citrix site, removing applications that are no longer required, or upgrading the applications to the latest version. Architect
Provisioning Services Archive audit reports Perform an archive of the Citrix Provisioning Server Audit Trail Information for compliance requirements. Auditors

Decision: Backup Location

The location of backups directly effects the recovery time and reliability of the Citrix environment. It is recommended to store backups of critical data both onsite and at an offsite location. If offsite backups are not possible due to costs associated or sensitivity of the data, backups should be placed at separate physical locations within the same datacenter.

Each backup option is discussed further below.

  • Onsite Backups – Onsite backups should be located on a storage device in the datacenter that will allow the data to be recovered quickly in the event of a failure. Onsite backups are ideal for issues that only affect a small subnet of hardware in the datacenter. Backups can also be stored on a cold storage solution such as tape. While this medium is slower to recover from, it provides additional protection since it is only active during the backup process.
  • Offsite Backups – Although the time to recover is much higher, offsite backups provide additional protection in the event of a disaster. Offsite backups may require transferring data over the Internet to a third party provider or they are created onsite and then transported to a remote location on storage mediums such as tape. It is typical to put a limited number of backups offsite. For example, one backup a week or month.

Decision: Testing Process

Regular updates and maintenance are an everyday part of IT operations. Standard processes must be followed to ensure updates do not negatively impact the production environment. This includes maintaining a dedicated testing infrastructure where modifications can be validated prior to being implemented in production.

Since changes to Citrix infrastructure can impact thousands of virtual desktop and application users, multi-phase testing is critical for the reliability and performance of the environment. As such, the process for testing should resemble the following:

Testing progress image

  • Development - The development infrastructure exists outside of the production network. Typically, it consists of short-lived virtual machines whose configuration matches production as closely as possible. The purpose of the development phase is to provide change requestors a non-production environment to perform proof of concepts, determine integration requirements and perform iterative testing as part of a discovery phase. Proposed changes should be documented so they can be applied in the test phase.
  • Testing - The test environment is a standalone 1:1 copy of the production infrastructure and is used to confirm that the proposed changes can be easily repeated prior to the preproduction staging environment. The changes made should follow documentation from the development stage. If testing fails within the testing stage, the architect must determine the severity of failure and determine whether minor updates to documentation is sufficient or a full development cycle is needed.
  • Pre-production - The pre-production environment should mimic the current production environment. The goal of staging is to implement the proposed changes with little risk or uncertainty. It is expected that any changes made to the staging infrastructure have been tested and documented for repeatability. There should not be any iterations or adjustments required within this phase. During this phase and within this environment User Acceptance Testing (UAT) should be performed.
  • Production - The production environment is a fully redundant and scalable solution designed for normal usage by end users. There should be minimal changes to the environment. If possible, all approved changes should be rolled out in stages to the production environment. This process is known as a staged rollout and mitigates risk by allowing changes to be rolled back, if necessary, without impacting the entire environment.

Decision: Change Control

Standardized processes that manage changes throughout a system’s lifecycle are necessary to ensure consistent and accountable performance. The following change control leading practices should be considered.

  • Use a change control window so that all applicable parties know when there might be downtime. Make sure that all teams are represented in the Change Advisory Board (CAB).
  • Every change should have a roll back plan.
  • If a change fails have a “hot wash” to determine what went wrong.
  • Always use an automated change control system so that support staff can quickly and easily identify changes.
  • When available, ensure configuration logging is enabled to track any changes made to the Citrix environment.

The change control process should be closely followed starting with a change request. A change request form should be filled out detailing changes requested, reasons for the change, and intended timeframes for the action. This is then reviewed and edited if required by a Change Manager and advisory board. When the change request has gone through the entire change approval process it is given to a change implementer who stages the change for testing, and finally conducts the implementation in production. A sample change control process, including detailed steps, is provided in the diagram below:

Change control process image

The process is as follows:

  1. The Change Request (CR) form is completed by any person requesting a change.

  2. After appropriate manager approvals have been acquired, the CR is forwarded to the appropriate Change Manager(s).

  3. The Change Manager validates the CR for completeness and logs the CR information into the Change Control Log for tracking. Incomplete change requests are returned to the requestor for update and re-submission.

  4. The Change Manager assesses the impact of the change in conjunction with subject matter experts and/or managers of the teams associated/affected by this change.

  5. The Change Manager works with the associated/affected teams as well as the change requestor in order to confirm the priority, category and type of the change as well as the proposed rollback plan.

  6. If the change is approved by the Change Manager, the CR is forwarded to the CAB for approval. If the change is rejected, the Change Control Log is updated with the current status as well as the reason of the rejection and the CR is send back to the requestor.

  7. The CAB reviews and validates the change in detail, and discusses and evaluates purpose, reasons, impact, cost and benefits. Each board member represents their department and provides guidance on the change requests. The CAB also reviews multiple requests to coordinate implementations and “package” requests into a single release schedule.

  8. Upon approval the change is sent back to the Change Manager to schedule the change for implementation into the staging environment.

  9. The change is implemented and tests are conducted. The results are sent back to the Change Manager.

  10. If the staging implementation and testing are successful, the change is scheduled for production implementation. In case the staging phase was not successful another staging iteration will be conducted.

  11. If possible, the change is rolled out in stages to the production environment. This process is known as a staged rollout and mitigates risk by allowing changes to be rolled back, if necessary, without impacting the entire environment. A rollback plan should be in place if there is an issue implementing a change in the production environment.

  12. The Change Manager reviews the implementation and finally updates the Change Control Log.

  13. On a periodic basis, the Change Manager reviews the Change Control Log to identify trends on type, frequency and size of changes and forwards the results to the CAB for review. In an emergency, the processes may be expedited. Should an issue be declared an emergency, a change request form is still filled out and delivered to the appropriate change management representative. When approved, the requested change is immediately implemented and the advisory board notified.

Decision: Availability Testing

Availability testing is focused on ensuring resources are still available in the instance of a component failure. These tests are essential to ensuring users always have access to business critical resources. The testing should be conducted during nonbusiness hours or during a scheduled maintenance weekend when appropriate notice has been given to end users to make them aware if any unforeseen issues arise.

The following is a list of the key components that should be tested on a regular basis.

  • StoreFront – StoreFront should be load balanced and health checked by a NetScaler or other load balancing device. To validate its configuration, all but one of the StoreFront servers should be shutdown. This will validate that the load balancing device is detecting the failure and directing users to the functioning server.
  • SQL – SQL Server should be in a high availability configuration. To validate the configuration, the primary SQL server should be taken offline and then the Citrix Studio console should be opened. Since Citrix Studio will not be accessible without a functioning SQL server, it will validate that the SQL server failover mechanisms are functioning properly.
  • Delivery Controllers - Resources deployed should be configured with a list of multiple Delivery Controllers. If one is made unavailable, desktops and application hosts will automatically establish a connection to another server in the list. To validate this, shutdown one of the Delivery Controller hosts and determine if the resources initially connected to it automatically register to another server. This can be determined by viewing the registration status of the resources inside Citrix Studio.

Sample Testing Workflow: Citrix Provisioning Services

Prerequisites and configuration requirements:

  • Hypervisor, XenApp, and XenDesktop services are up and running.
  • At least two PVS servers are installed and configured, providing the streamed disk image.
  • Resilient networking and storage infrastructure with multiple links to each server.
  • Test users are active on the XenApp or XenDesktop machines.
Steps Expected Results
PVS Server Outage: Shutdown one of the Provisioning Servers. Validate PVS continues to function. Restart PVS Server. Validate connections rebalance between PVS Servers. Try the other(rest) PVS server(s) one by one. Existing XenApp/XenDesktop machines connect to another PVS server. There is limited to no impact to the users utilizing that server. New XenApp/XenDesktop machines can be booted and start correctly. SCOM reports that the PVS server is down/not available. Live connections are rebalanced between both PVS servers once both PVS servers are made available again.
PVS Bond Disruption: Disable / unplug a NIC in the PVS Streaming Bond on the PVS server. Provisioning Server continues to stream over remaining NICS in PVS Streaming Bond.
SQL Server PVS Database Mirror Failover: Admin logs on to Principle SQL Server. Initiate failover of PVS database. Validate PVS continues to function. Initiate failback of PVS database. Validate PVS continues to function. PVS continues to function.
SQL Service Outage: Admin reboots both Principle & Mirror SQL Servers simultaneously. Validate PVS continues to function, but that administration is not possible. Wait for the SQL Server to come back online. Validate PVS administrative functions are once again possible. PVS continues to function. PVS administrative functions are no longer available. PVS administrative functions are available once the SQL services are restored.

Sample Testing Workflow: Citrix XenDesktop and XenApp Services

Prerequisites and configuration requirements:

  • Hypervisor, XenDesktop, and StoreFront services are up and running.
  • Network and storage services available.
  • Provisioning Services is providing the streamed disk images.
  • Test users are active on the virtual machines.
  • SQL (Mirroring) and XenDesktop servers are up and running.
  • Ensure multiple StoreFront servers are running.
  • NetScaler load balancing services.
Steps Expected Results
XenApp/XenDesktop 7.x Delivery Controller Citrix Broker Service Outage: Stop the Citrix Broker Service on one of the Delivery Controller servers. Validate virtual desktops or applications can still be enumerated and launched. Start the Citrix Broker Service on the Delivery Controller server. Shutdown one of the Desktop Controllers. Validate virtual desktops or applications can still be enumerated and launched. With a desktop launched, determine which Controller owns the host connection. Shut the Controller down and verify that another Controller takes over the session. Note: This should be done during the maintenance window. Once complete, the VDI resources should be rebooted so the VDAs are evenly distributed across all controllers. StoreFront correctly identifies service as being unavailable and redirects connections to remaining Delivery Controller. Desktops continue to be enumerated and launch successfully. Launched desktop can be supported if a hosting Controller goes down.
SQL Server Database Mirror Failover: Admin logs on to principle SQL Server. Initiate failover of XenApp/XenDesktop database. Validate XenApp/XenDesktop continues to function. The database should failover and the Citrix Studio should pick up the failover database with no issues. Existing sessions are not impacted. New sessions are possible. Administrative functions are possible.
SQL Service Outage: Admin restarts both principle & mirror SQL Servers simultaneously. Validate XenApp/XenDesktop continues to function, but that administration is not possible. Wait for the SQL Service to come back online. Validate administrative functions are once again possible. Existing XenDesktop sessions are not impacted. Recently used applications, hosted shared desktops and assigned VDI can be accessed due to local host cache. XenDesktop Administrative functions are not possible. XenDesktop Administrative functions are possible once SQL service is available.

Sample Testing Workflow: Citrix Licensing Services

Prerequisites and configuration requirements:

  • Citrix Licensing Server up and running (with valid licenses installed).
  • Hypervisor, XenApp/XenDesktop and StoreFront services are up and running.
  • Users are active on the Server OS or Desktop OS machines.
Steps Expected Results
Server: Shutdown the Citrix Licensing server. Reboot an existing Server OS machine. Logon to the Citrix StoreFront and launch a published application. Reboot an existing Desktop OS machine. Logon to the Citrix StoreFront and launch a virtual desktop. License Server connectivity error posted in Event Log. Provisioned Server OS boots successfully. Users are able to launch published applications. Provisioned Desktop OS boots successfully. User is able to launch a virtual desktop. Administrators will have 30 days grace to recover the Citrix Licensing Server.

Process 3: Monitoring

By having an in-depth understanding of current and expected behavior of the Citrix environment and its components, administrators are better equipped to discover an issue before it affects the user community. Furthermore, the data tracked during normal operations is beneficial for trending and capacity planning. This section defines the monitoring recommendations for a Citrix environment as well as some recommended tools.

Decision: Automated Monitoring

Depending on the size and scope of the XenApp and XenDesktop solution, it can take considerable time for an administrator to verify services, events, capacity and performance. It is advisable for administrators to investigate automation into their monitoring strategy.

Citrix includes a cloud-hosted monitoring solution called Smart Check, which is a free service for any organization with active Citrix Customer Success Services: Select offering. Smart Check executes the following in a XenApp and XenDesktop environment:

  • Site Health Checks – Evaluates all services with the XenApp and XenDesktop site
  • Apps and Desktops Checks = Verifies delivery group availability
  • Update Checks – Tracks and recommends patches and hotfixes for delivery controllers
  • LTSR Checks – Verifies the delivery controllers and VDAs within the site comply with LTSR versions.
  • Custom Checks – Allows administrators to import their own custom scripts to test across their XenApp and XenDesktop site.

A list of the current Smart Check capabilities, review the Smart Check documentation.

Decision: Performance Monitor Metrics

Monitoring the performance of the overall environment is crucial towards making sure all components are available and performing effectively to ensure users have a high quality experience.

Different components within the overall solution require monitoring of unique metrics with appropriately set thresholds. The metrics and thresholds presented are based on real world experience but may not apply to all environments. Organizations will need to perform their own baselining, validity testing and validation before implementing within a production environment.

Note

Some hypervisors, such as VMware vSphere and Hyper-V, provide specific performance counters for tracking CPU and Memory utilization within virtual machines (i.e. “VM Processor \ % Processor Time”). These performance counters should be used in addition to the general counters listed below.

General

These performance counters should be used to monitor the key performance metrics of the Citrix infrastructure, application servers, and virtual desktops.

Metric Description Warning (Yellow) Critical (Red) Troubleshooting/Remediation
Processor - % Processor Time % Processor Time is the percentage of elapsed time that the processor spends to execute a non-Idle thread. It is calculated by measuring the duration of the idle thread is active in the sample interval, and subtracting that time from interval duration. (Each processor has an idle thread that consumes cycles when no other threads are ready to run). This counter is the primary indicator of processor activity, and displays the average percentage of busy time observed during the sample interval. It is calculated by monitoring the time that the service is inactive and subtracting that value from 100%. 80% for 15 minutes 95% for 15 minutes Identify the processes/services consuming processor time using Task Manager or Resource Monitor. If all processes/services work within normal parameters and the level of CPU consumption is an expected behavior it should be considered to add additional CPU resources to this system in the future. If a process/service can be identified which works outside normal parameters, the process should be killed. Please note that killing a process can cause unsaved data to be lost.
System - Processor Queue Length Processor queue length is the number of threads in the processor queue. Unlike the disk counters, this counter shows ready threads only, not threads that are running. There is a single queue for processor time even on computers with multiple processors. Therefore, if a computer has multiple processors, you need to divide this value by the number of processors servicing the workload. A sustained processor queue of less than ten threads per processor is normally acceptable, dependent of the workload. 5 (per core) for 5 minutes or 6 (per core) for 15 minutes 10 (per Core) for 10 minutes or 12 (per core) for 30 minutes A long CPU queue is a clear symptom of a CPU bottleneck. Please follow the steps outlined for counter “Processor - % Processor Time”.
Memory – Available Bytes Available memory indicates the amount of memory that is left after nonpaged pool allocations, paged pool allocations, process’ working sets, and the file system cache have all taken their piece. <30% of total RAM or 20% of physical memory over 6 minutes <15% of total RAM or 5% of physical memory over 6 minutes Identify the processes/services consuming memory using Task Manager or Resource Monitor. If all processes/services work within normal parameters and the level of memory consumption is an expected behavior it should be considered to add additional memory to this system in the future. If a process/service can be identified which works outside normal parameters, the process should be killed. Please note that killing a process can cause unsaved data to be lost.
Memory – Pages/sec Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. >10 >20 A high value reported for this counter typically indicates a memory bottleneck, except if “Memory – Available Bytes” reports a high value at the same time. In this case most likely an application is sequentially reading a file from memory. Please refer to Microsoft Knowledge Base article KB139609 – High Number of Pages/Sec Not Necessarily Low Memory for further information.
Paging File - %Usage This is the percentage amount of the Page File instance in use. >40% or 80% over 60 minutes >70% or 95% over 60 minutes Review this value in conjunction with “Memory - Available Bytes” and “Memory - Pages/sec” to understand paging activity on the affected system.
LogicalDisk/PhysicalDisk - % Free Space % Free Space is the percentage of total usable space on the selected logical disk drive that is free. <20% of physical disk or 20% reported after 2 minutes <10% of physical disk or 15% reported after 1 minute Identify which files or folders consume disk space and delete obsolete files if possible. In case no files can be deleted, consider increasing the size of the affected partition or add additional disks.
LogicalDisk/PhysicalDisk - % Disk Time % Disk Time marks how busy the disk is. >70% consistently or 90% over 15 minutes (_Total) >90% consistently or 95% over 15 minutes (_Total) Identify the processes / services consuming disk time using Task Manager or Resource Monitor. If all processes/services work within normal parameters and the level of disk consumption is an expected behavior it should be considered to move the affected partition to a more capable disk subsystem in the future. If a process/service can be identified which works outside normal parameters, the process should be killed. Please note that killing a process can cause unsaved data to be lost.
LogicalDisk/PhysicalDisk – Current Disk Queue Length Current disk queue length provides a primary measure of disk congestion. It is an indication of the number of transactions that are waiting to be processed. >=1 (per spindle) consistently or 3 over 15 minutes (_Total) >=2 (per spindle) consistently or 10 over 30 minutes (_Total) A long disk queue length typically indicated a disk performance bottleneck. This can be caused by either processes/services causing a high number of I/Os or a shortage of physical memory. Please follow the steps outlined for counter “LogicalDisk/PhysicalDisk - % Disk Time” and counter “Memory – Available Bytes”
LogicalDisk/PhysicalDi sk – Avg. Disk Sec/Read; – Avg. Disk Sec/Write; – Avg. Disk Sec/Transfer The Average Disk Second counters show the average time in seconds of a read/write/transfer from or to a disk. >=15ms consistently, >=20ms consistently High disk read or write latency indicates a disk performance bottleneck. Systems affected will become slow, unresponsive and application or services may fail. Please follow the steps outlined for counter “LogicalDisk/PhysicalDisk - % Disk Time”
Network Interface – Bytes Total/sec Bytes Total/sec shows the rate at which the network adaptor is processing data bytes. This counter includes all application and file data, in addition to protocol information, such as packet headers. <8MB/sfor 100 Mbit/s adaptor; <80 MB/s for 1000 Mbit/s adaptor or 60% of NIC speed inbound and outbound traffic for 1 min. 70% of NIC speed inbound and outbound traffic for 1 min. Identify the processes / services consuming network using Task Manager or Resource Monitor. If all processes/services work within normal parameters and the level of bandwidth consumption is an expected behavior it should be considered to move the respective process/service to a dedicated NIC (or team of NICs). If a process/service can be identified which works outside normal parameters, the process should be killed. Please note that killing a process can cause unsaved data to be lost.

XenApp/XenDesktop

These performance counters are specific to the Delivery Controllers.

Metric Description Warning (Yellow) Critical (Red) Troubleshooting/Remediation
Database Avg. Transaction Time The time on average, in seconds, taken to execute a database transaction. A baseline needs to be established in the environment in order to accurately establish threshold values. Based on baseline values Based on baseline values In case the reported values exceed the baseline response time constantly, a potential performance issue needs to be investigated at the SQL server level.
Database Connected Indicates whether this service is in contact with its database. (1 is connected; 0 is not connected). 0 0 (for over 30 minutes) Both values report connectivity issues of the XenDesktop Broker service with the database. In case issues are reported, SQL server and network availability needs to be verified.
Database Transaction Errors/sec The rate at which database transactions are failing. None > 0 Both values report connectivity issues of the XenDesktop Broker service with the database. In case issues are reported, SQL server and network availability needs to be verified.

StoreFront

These performance counters are specific to the StoreFront servers.

Metric Description Warning (Yellow) Critical (Red)
ASP.NET – Request Queued The number of requests waiting to be processed by ASP. A baseline needs to be established in the environment in order to accurately establish threshold values. Based on baseline values Based on baseline values
ASP.NET – Requests Rejected The number of requests rejected because the request queue was full. None > = 1
APP_POOL_WAS\Current Application Pool State\Citrix Receiver for Web 3    
APP_POOL_WAS\Current Application Pool State\Citrix Delivery Services Authentication 3    
APP_POOL_WAS\Current Application Pool State\Citrix Delivery Services Resource 3    
Request response Regardless it is authentication, enumerate or subscription) should be 3 to 5 seconds (http://www.perftestplus.com/resources/how_fast.pdf)    

Citrix License Server

These performance counters are specific to the Citrix License Server

Metric Description Warning (Yellow) Critical (Red) Troubleshooting/Remediation
Citrix Licensing – Last Recorded License Check-Out Response Time Displays the last recorded license check-out response time in milliseconds. >2000 ms > 5000 ms If the reported values exceed the 5000 ms response time, a potential performance issue needs to be investigated in the Citrix License Server.
Citrix Licensing – License Server Connection Failure Displays the number of minutes that XenDesktop has been disconnected from the License Server. > 1 minute > 1440 minutes Both values report connectivity issues with the License Server. In case issues are reported, License Server and network availability needs to be verified.

Decision: Services Monitoring

Windows services that are critical to basic server functionality should be automatically monitored to ensure that they are running properly. The following table provides a list of the common Windows services that should be monitored. When any of these services are restarted or stopped a warning (Yellow) or critical (Red) alert should be assigned respectively. The recommended recovery actions for the services listed below are as follows:

  • First failure: Restart the Service
  • Second Failure: Restart the Service
  • Subsequent Failures: Put the server in maintenance mode and investigate the root cause

XenApp/XenDesktop

Service Functionality Administration Risk
Citrix AD Identity Service Manages Active Directory computer accounts. Dependencies: WMI Service Machine Creation Service relies on this service to create virtual machines. Administrators will be unable to create new or modify existing Machine Catalogs. Administrators will be unable to establish new connections to Citrix Studio.
Citrix Broker Service Manages connections to virtual machines and applications. If this service is stopped administrators will be unable to make changes to the environment or establish new connections to Citrix Studio. Any existing administrator connections to Citrix Studio can also be terminated. If this service is stopped existing user connections are not affected. No new connections can be established. Users logging into StoreFront will be unable to see any resources available for selection. Once the service is restarted users will need to re-login to StoreFront to establish connections.
Citrix Configuration Logging Service Logs administrator activity and configuration changes in a XenDesktop deployment. If this service is stopped XenApp/XenDesktop will be unable to communicate with the Configuration Logging Database. Administrators will be unable make changes to the environment or establish new connections to Citrix Studio.
Citrix Configuration Service Stores service configuration information. Dependencies: WMI Service If this service is stopped administrators will be unable to make changes to the environment or establish new connections to Citrix Studio.
Citrix Delegated Administration Service Manages configuration of delegated administration permissions. If this service is stopped XenApp/XenDesktop cannot assign administrative permissions. Administrators will be unable to make changes to the environment or establish new connections to Citrix Studio. Administrators will be unable to establish new connections to Citrix Director and existing sessions within Citrix Director will be interrupted.
Citrix Diagnostic Facility COM Server Service Manages and controls Citrix diagnostic trace sessions on the system. Dependencies: RPC Service This service has no impact on the production environment. It is used to generate CDF trace files which aid in troubleshooting issues.
Citrix Environment Test Service Manages tests for evaluating the state of a XenDesktop Site. If this service is stopped administrators will be unable to establish new connections to Citrix Studio. Administrators will also be unable to check the status of the Citrix site configuration, machine catalogs, and delivery groups by running the tests under “Common Tasks” in the Citrix Studio administration console.
Citrix Host Services Manages host and hypervisor connections. Dependencies: WMI Service Administrators will be unable to create new Machine Catalogs or control virtual machine power settings via Citrix Studio. Administrators will be unable to establish new connections to Citrix Studio. Users may experience issues connecting to virtual desktops when this service is not available. If this service is stopped existing connections are not affected.
Citrix Machine Creation Service Creates new virtual machines. Dependencies: WMI Service Administrators will be unable to create new or modify existing Machine Catalogs or establish new connections to Citrix Studio. Administrators will be unable to establish new connections to Citrix Studio.
Citrix Monitor Service Monitors the FlexCast system. If this service is stopped XenApp/XenDesktop will be unable to communicate with the Monitoring Database. Citrix Director will be unable to retrieve any data on the environment. Administrators will be unable to establish new connections to Citrix Studio.
Citrix StoreFront Service Manages deployment of StoreFront. Administrators will be unable to establish new connections to Citrix Studio.

Delivery Controller Services Monitoring in Citrix Director

The Infrastructure pane within the Citrix Director dashboard provides status of the services running on the Delivery Controllers and will provide warning indications if a service or Controller is unavailable. These alerts can be accessed by clicking the Alert hyperlink within the Infrastructure pane.

Director infrastructure image

Provisioning Services

Service Functionality Risk
Citrix PVS PXE Service Provides the PVS PXE Boot Server functionality. Note: Only applicable when PXE boot is used. On failure of this service target devices may not be able to boot successfully if PXE booting is leveraged.
Citrix PVS Stream Service Streams contents of the vDisk to the target device on demand. If this service stopped it will not be possible to stream vDisk images.
Citrix PVS SOAP Service Provides framework for external or existing solutions to interface with Provisioning services.Note: Only impacts console operations. User is unaffected If this service fails PVS Server to PVS Server communication as well as PVS Console to PVS Server communication is not possible.
Citrix PVS TFTP Service Provides the TFTP Server functionality. Note: Only applicable when TFTP is used. On failure of this service target devices may not be able to boot if this server is used as TFTP server for the bootstrap.
Citrix PVS Two- Stage Boot Service Provides the bootstrap functionality for devices booting by means of a BDM ISO file. Note: Only when BDM boot partitions are used. On failure of this service target devices may not be able to boot if a BDM ISO file is used.

StoreFront

Service Functionality Risk
Citrix Cluster Join Service Provides Server Group join services. This service is started when adding additional StoreFront servers to a Server Group. If this service does not start or is interrupted when this process is initiated the additional server will be unable to join the indicated Server Group and the process will result in an error.
Citrix Configuration Replication Provides access to Delivery Services configuration information. This service only exists on the primary StoreFront server of a Server Group. If this service is stopped additional StoreFront servers will be unable to join the Server Group and any changes made to the primary StoreFront server will not be replicated to other servers. This can result in servers within the Server Group being out of sync.
Citrix Credential Wallet Provides a secure store of credentials. Dependencies: Citrix Peer Resolution Service If this service is stopped users will be unable to login to access their desktops or applications. Users logged into StoreFront will be unable to launch new application or desktop sessions. Existing application or desktop sessions are unaffected.
Citrix Default Domain Services Provides authentication, change password, and other domain services. If this service is stopped users will be unable to login to access their desktops or applications. Users currently logged in will not be affected.
Citrix Peer Resolution Service Resolves peer names within peer-to-peer meshes. On failure of this service both the Citrix Credential Wallet and Citrix Subscriptions store are stopped generating the risks associated with those services.
Citrix Storefront Privileged Administration Service Manages privileged operations on Storefront  
Citrix Subscriptions Store Provides a store and replication of user subscriptions. Dependencies: Citrix Peer Resolution Service If this service is stopped Citrix Receiver cannot add, remove, and reposition applications within StoreFront. Users will need to re-add applications and all changes made to their selection of applications within the StoreFront store will not be saved or replicated to other sessions. Original user configuration will be restored once the service is restarted.
World Wide Web Publishing Service Provides web connectivity and administration through the Internet Information Services Manager. Dependencies: HTTP; RPC Service Access to published applications or published desktops will not be available through StoreFront. Users will be unable to resolve the Receiver for Web login page. Users logged into StoreFront will be unable to launch new application or desktop sessions and will need to reenter credentials when the service is restarted. Existing application or desktop sessions are unaffected.

Web Interface

Service Functionality Risk
World Wide Web Publishing Service Provides web connectivity and administration through the Internet Information Services Manager. Dependencies: HTTP; RPC Service Access to published applications or published desktops will not be available through Web Interface if the WWW service is not available.

Citrix License Server

Service Functionality Risk
Citrix Licensing Service Provides licensing services for Citrix products. Licensing mode changes to grace period when service is stopped or License Server cannot be contacted. If not monitored, functionality of Citrix products will cease after grace period expires.
Citrix Licensing Support Service This account controls reading the license files and updating strings with license trailers (data dictionary functionality). None
Citrix Licensing WMI The Citrix License Management Console collects license data information using the WMI service. None

Decision: Events Monitoring

Monitoring the Windows Event Log for unknown or critical events can help to proactively discover issues and allow administrators to understand event patterns:

  • Licensing - Errors in the Event Log dealing with Remote Desktop licensing should be investigated. This might be a result of the installed Citrix product not being able to contact the Remote Desktop Licensing Server or the Citrix Licensing Server. If errors in the Event Log are not reviewed, users might eventually be denied access because they cannot acquire a valid license.
  • Hardware Failure - Any event notification that relates to a hardware failure should be looked at immediately. Any device that has failed will have an impact on the performance of the system. At a minimum, a hardware failure will remove the redundancy of the component.
  • Security Warnings - Customers should investigate security warnings or audit failure events regarding failed logons in the security log. This could be an indication that someone is attempting to compromise the servers.
  • Disk Capacity - As the drives of a Windows system reach 90% of capacity, an event error message will be generated. To ensure continuous service, customers should poll these event errors. As the system runs out of hard disk space, the system is put at severe risk. The server might not have enough space left to service the requests of users for temporary file storage.
  • Application / Service errors - Any event notification that relates to application or services errors should be investigated.
  • Citrix errors - All Citrix software components will leverage the Windows Event Log for error logging. A list of the known Event Log warnings and errors issued by Citrix components can be found at the following links:

It is important to periodically check the Event Viewer for Citrix related warnings or errors. Warnings or errors that repeatedly appear in the logs should be investigated immediately, because it may indicate a problem that could severely impact the Citrix environment if not properly resolved.

In multi-server environments it becomes easier to administer the servers when logs can be collected and reviewed from a central location. Most enterprise grade monitoring solutions provide this functionality. More sophisticated monitoring solutions enable an administrator to correlate event information with other data points such as performance metrics or availability statistics. In case the selected monitoring solution does not provide this functionality the Windows Server 2008 R2 or Windows Server 2012/2012 R2 Event Log subscription feature can be used. This feature allows administrators to receive events from multiple servers and view them from a designated collector computer. Please see Microsoft TechNet article Manage Subscriptions for more information.

XenServer is also capable of sending its logs to a central syslog server. The administrator sets the IP address of the syslog daemon server in the properties of each XenServer in the pool. This configuration allows administrators to capture real-time activity across multiple XenServer hosts. Further information can be found within the XenServer Admin Guide.

Decision: Capacity Management

In addition to the day-to-day monitoring of system-level metrics, performance metrics should be tracked from a historical perspective to help plan for future growth as more users access the environment.

A baseline of the environment performance should be taken so that it can be compared against performance over time. For example, if a user complains of poor performance, this baseline can be used for comparison purposes to identify if the issues are related to the user load exceeding the capacity of the environment.

An example of baseline performance metrics for capacity management would include historical data for CPU, Memory, and network utilization on the Delivery Controller and application servers or desktops.

Citrix Director

Administrators can utilize the Trends view within Citrix Director to track different parameters of the Citrix XenApp/XenDesktop deployment over time. These parameters can be leveraged for capacity planning of the Citrix environment.

From the Trends view, administrators can see historical data that is broken up into several categories including:

  • Sessions - Provides the concurrent session usage over time enabling the ability to size the environment appropriately.
  • Connection Failures - Gives an overview of the different types of connection failures that have occurred across different Delivery Groups.
  • Failed Desktop OS Machines – Gives an overview of the different problems associated with failures in desktop machines.
  • Failed Server OS Machines - Gives an overview of the different problems associated with failures in server machines.
  • Logon Performance – Shows how long it takes for users to log on to their applications and desktops.
  • Load Evaluator Index – Provides various performance counter-based metrics, including CPU, Memory, and Disk Usage for Server OS machines.
  • Capacity Management – Shows utilization of published applications and desktops.
  • Resource Utilization – Provides information on CPU, Memory and storage resource utilization.
  • Custom Reports – Allows administrators to create custom historical reports on numerous metrics captured by the system.
  • Hosted Application Usage – Details all applications published in the site and can provide usage information about each individual applications in detail (concurrent instances, launches, usage duration, and so on). Note: Requires XenApp or XenDesktop Platinum licensing
  • Network – Network analytics provided through NetScaler HDX Insight.

Director trends image

For more information on Citrix Director Trends, please refer to the following.

The creation of the handbook is a time consuming process and requires real deployment experience across many scenarios. Citrix would like to thank the authors and subject matter experts who contributed to the Citrix VDI Handbook.

Monitor