Scalability considerations

Session Recording is a highly scalable system that handles thousands or tens of thousands of sessions. Installing and running Session Recording requires few extra resources beyond what is necessary to run Citrix Virtual Apps and Desktops. However, if you plan to use Session Recording to record a large number of sessions or if the sessions you plan to record can result in large session files (for example, graphically intense applications), consider the performance of your system when planning your Session Recording deployment.

This article explains how Session Recording achieves high scalability and how you can get the most out of your recording system at a lowest cost.

Why Session Recording scales well

There are two major reasons that Session Recording scales well compared with competitive products:

  • Small file size

    A recorded session file made with Session Recording is highly compact. It is many orders of magnitude smaller than an equivalent video recording made with solutions that screen-scrape. The network bandwidth, disk space, and disk IOPS required to transport and store each Session Recording file is typically at least 10 times less than an equivalent video file.

    The small size of recorded session files means faster and smoother rendering of video frames. Recordings are also completely lossless and have no pixelation that is common in most compact video formats. Text in recordings is easy to read during playback as it is in the original sessions. To maintain small file sizes, Session Recording does not record key frames within the files.

  • Low processing required to generate files

    A recorded session file contains the ICA protocol data for a session that is extracted virtually in its native format. This means the file captures the ICA protocol data stream that is used to communicate with Citrix Workspace app. There is no need to run expensive transcoding or encoding software components to change the format of data in real time. The low amount of processing is also important for VDA scalability and ensures the end-user experience is maintained when many sessions are recorded from the same VDA.

    Moreover, only those ICA virtual channels that are capable of being played back are recorded, which results in a further optimization. For example, the printer and client drive mapping channels are not recorded because they can generate high volumes of data without any benefit in video playback.

Estimate data input and processing rates

The Session Recording Server is the central collection point for recorded session files. Each machine that is running a multi-session OS VDA with Session Recording enabled sends recorded session data to the Session Recording Server. Session Recording can handle high volumes of data and can tolerate bursts and faults, but there are physical limits on how much data any one server can handle.

Consider how much data you will be sending to each Session Recording Server and how quickly the servers can process and store this data. The rate at which your system can store incoming data must be higher than the data input rate.

To estimate your data input rate, multiply the number of sessions recorded by the average size of each recorded session and divide by the time for which you are recording sessions. For example, you might record 5,000 Microsoft Outlook sessions of 20 MB each over an 8-hour work day. In this case, the data input rate is approximately 3.5 Mbps. (5,000 sessions times 20 MB divided by 8 hours, divided by 3,600 seconds per hour.) A typical Session Recording Server connected to a 100 Mbps LAN with sufficient disk space to store the recorded data is capable of processing data at around 5.0 Mbps based on the physical limits imposed by disk and network IOPS. This is the processing rate. In the example, the processing rate (5.0 Mbps) is higher than the input rate (3.5 Mbps), so recording the 5,000 Outlook sessions is feasible.

Note that the amount of data per session varies greatly depending on what is being recorded, while other factors such as the screen resolution, color depth, and graphics mode also have impacts. A session running a CAD package where graphics activity is constantly high likely generates a much larger recording than a session in which the end user sends and receives emails in Microsoft Outlook. Therefore, recording the same number of CAD sessions can generate an extremely high input rate and require the use of more Session Recording Servers.

Bursts and faults

The previous example assumes a very simple uniform throughput of data but does not explain how the system deals with short periods of higher activity, known as bursts. A burst might occur when all users log on at the same time in the morning, known as the 9 o’clock rush, or when they receive the same email in their Outlook inbox at once. The 5.0 Mbps processing rate of the Session Recording Server is highly inadequate at dealing with this sudden demand.

The Session Recording Agent running on each VDA uses Microsoft Message Queuing (MSMQ) to send recorded data to the Storage Manager running on the central Session Recording Server. The data is sent in a store-and-forward manner similar to how an email is delivered between the sender, mail server, and the receiver. If the Session Recording Server or the network cannot handle the high rate of data in a burst, the recorded session data is temporarily stored until the backlog of data messages is cleared. The data message might be temporarily stored in the outgoing queue on the VDA if the network is congested, or stored on the Session Recording Server’s receiving queue if the data has traversed the network but the Storage Manager is still busy processing other messages.

MSMQ also serves as a fault tolerance mechanism. If the Session Recording Server goes down or the link is broken, recorded data is held in the outgoing queue on each VDA. When the fault is rectified, all queued data is sent together. The use of MSMQ also allows you to take a Session Recording Server offline for upgrade or maintenance without interrupting the recording of existing sessions and losing data.

The main limitation of MSMQ is that disk space for the temporary storage of data messages is finite. This limits how long a burst, fault, or maintenance event can last before data is eventually lost. The overall system can continue after data loss, but in this situation, individual recordings have chunks of data missing. A file with missing data is still playable but only up to the point where data was first lost. Note the following:

  • Adding more disk space to each server, especially the Session Recording Server, and making it available to MSMQ can increase the tolerance to bursts and faults.

  • It is important to configure the Message Life setting for each Session Recording Agent to an appropriate level (on the Connections tab in Session Recording Agent Properties). The default value of 7,200 seconds (two hours) means that each recorded data message has two hours to reach the Storage Manager before it is discarded and recording files are damaged. With more disk space available (or fewer sessions to record), you can choose to increase this value. The maximum value is 365 days.

The other limitation with MSMQ is that when data backlogs, there is extra disk IOPS in the queue to read and write data messages. Under normal conditions, the Storage Manager receives and processes data from the network directly without the data message ever being written to disk. Storing the data involves a single write operation to disk that appends the recorded session file. When data is backlogged, the disk IOPS is tripled: each message must be written to disk, read from disk, and written to file. As the Storage Manager is heavily IOPS bound, the processing rate of the Session Recording Server drops until the backlog of messages is cleared. To mitigate the effects of this extra IOPS, adopt the following recommendations:

  • Ensure that the disk on which MSMQ stores messages is different from the recording file storage folders. Even though IOPS bus traffic is tripled, the drop in the true processing rate is never as severe.

  • Have planned outages at off-peak times only. Depending on budget constraints, follow recognized approaches to building high availability servers. The approaches include the use of UPS, dual NICs, redundant switches, and hot swappable memory and disks.

Design for spare capacity

The data rate of recorded session data is unlikely to be uniform, bursts and faults might occur, and the clearing of message backlogs is expensive in IOPS. For this reason, design each Session Recording Server with plenty of spare capacity. Adding more servers or improving the specification of existing servers, as described in later sections, always gains you extra capacity. The general rule of thumb is to run each Session Recording server at a maximum of 50% of its total capacity. In the earlier example, if the server is capable of processing 5.0 Mbps, target the system to run only at 2.5 Mbps. Instead of recording 5,000 Outlook sessions that generate 3.5 Mbps on one Session Recording Server, reduce this to 3,500 sessions that generate only about 2.5 Mbps.

Backlogs and live playback

Live playback is when a reviewer opens a session recording for playback while the session is still active. During live playback, the Session Recording Agent responsible for the session switches to a streaming mode for that session, and recording data is sent immediately to the Storage Manager without internal buffering. Because the recording file is constantly updated, the Player can continue to be fed with the latest data from the live session. However, data sent from the Agent to the Storage Manager is through MSMQ, so the queuing rules described earlier apply. A problem can occur in this scenario. When MSMQ is backlogged, the new recorded data available for live playback is queued like all other data messages. The reviewer can still play the file, but viewing latest live recorded data is delayed. If live playback is an important feature for reviewers, ensure a low probability of backlog by designing spare capacity and fault tolerance into your deployment.

Citrix Virtual Apps and Desktops scalability

Session Recording never reduces session performance and never stops sessions in response to recorded data backlogs. Maintaining the end-user experience and single-server scalability is paramount in the design of the Session Recording system. If the recording system becomes irreversibly overloaded, recorded session data is discarded. Extensive scalability testing by Citrix reveals that the impact of recording ICA sessions on the performance and scalability of Citrix Virtual Apps and Desktops servers is low. The size of the impact depends on the platform, memory available, and the graphical nature of the sessions being recorded. With the following configuration, you can expect a single-server scalability impact of between 1% and 5%. In other words, if a server can host 100 users without Session Recording installed, it can host 95–99 users after installation:

  • 64-bit server with 8 GB RAM running a multi-session OS VDA
  • All sessions running Office productivity applications, such as Outlook and Excel
  • The use of applications is active and sustained
  • All sessions are recorded as configured by the Session Recording policies

If fewer sessions are recorded or session activity is less sustained and more sporadic, the impact is less. In many cases, the scalability impact is negligible and user density per server remains the same. As mentioned earlier, the low impact is due to the simple processing requirements of the Session Recording components installed on each VDA. Recorded data is simply extracted from the ICA session stack and sent as-is to the Session Recording Server through MSMQ. There is no expensive encoding of data.

There is a minor overhead of using Session Recording even when no sessions are recorded. Although the impact is low, if you are sure that no sessions will ever be recorded from a particular server, you can disable recording on that server. Removing Session Recording is one way of doing this. A less invasive approach is to clear the Enable session recording for this VDA machine check box on the Session Recording tab in Session Recording Agent Properties. If session recording is required in future, reselect this check box.

Measuring throughput

There are various ways to measure throughput of recorded session data from the sending VDA to the receiving Session Recording Server. One of the simplest and most effective approaches is to observe the size of files that are recorded, and the rate at which disk space on the Session Recording Server is being consumed. The volume of data written to disk closely reflects the volume of network traffic being generated. The Windows Performance Monitor tool (perfmon.exe) has a range of standard system counters that can be observed in addition to some counters provided by Session Recording. Counters can be used to measure throughput, and identify bottlenecks and system problems. The following table outlines some of the most useful performance counters.

Performance Object Counter Name Description
Citrix Session Recording Agent Active Recording Count Indicates the number of sessions that are currently being recorded on a particular VDA.
Citrix Session Recording Agent Bytes read from the Session Recording Driver The number of bytes read from the kernel components responsible for acquiring session data. Useful for determining how much data a single VDA generates for all sessions recorded on that server.
Citrix Session Recording Storage Manager Active Recording Count Similar to the Citrix Session Recording Agent counter except for the Session Recording Server. Indicates the total number of sessions currently being recorded for all servers.
Citrix Session Recording Storage Manager Message bytes/sec The throughput of all recorded sessions. Can be used to determine the rate at which the Storage Manager is processing data. If MSMQ is backlogged with messages, the Storage Manager runs at full speed. This value can be used to indicate the maximum processing rate of the Storage Manager.
LogicalDisk Disk Write Bytes/sec Can be used to measure disk write-through performance. This is important in achieving high scalability for the Session Recording Server. Performance of individual drives can also be observed.
MSMQ Queue Bytes in Queue This counter can be used to determine the amount of data backlogged in the CitrixSmAudData message queue. If this value increases over time, the rate of recorded data received from the network is greater than the rate at which the Storage Manager can process data. This counter is useful for observing the effect of data bursts and faults.
MSMQ Queue Message in Queue Similar to the Bytes in Queue counter but measures the number of messages.
Network Interface Bytes Total/sec Can be measured on both sides of the link to observe how much data is generated when sessions are recorded. When measured on the Session Recording Server, this counter indicates the rate at which incoming data is received. Contrasts with the Citrix Session Recording Storage Manager/Message bytes/sec counter that measures the processing rate of data. If network rate is greater than this value, messages build in the message queue.
Processor % Processor Time Worth monitoring even though CPU is unlikely to be a bottleneck.

Session Recording Server hardware

You can increase the capacity of your deployment by carefully selecting the hardware used for the Session Recording Server. You have two choices: scaling up (by increasing the capacity of each server) or scaling out (by adding more servers). In making either of the choices, your aim is to increase scalability at a lowest cost.

Scaling up

When examining a single Session Recording Server, consider the following best practices to ensure optimal performance for available budgets. The system is dependent on IOPS. This ensures a high throughput of recorded data from the network onto the disk. So it is important to invest in appropriate network and disk hardware. For a high-performance Session Recording Server, a dual CPU or dual core CPU is recommended but little is gained from any higher specification. 64-bit processor architecture is recommended but an x86 processor type is also suitable. 4 GB of RAM is recommended but again there is little benefit from adding more.

Scaling out

Even with the best scaling up practices, there are limits to performance and scalability that can be reached with a single Session Recording Server when recording a large number of sessions. It might be necessary to add extra servers to meet the load. You can install more Session Recording Servers on different machines to have the Session Recording Servers work as a load balancing pool. In this type of deployment, the Session Recording Servers share the storage and the database. To distribute the load, point the Session Recording Agents to the load balancer that is responsible for the workload distribution.

Network capacity

A 100 Mbps network link is suitable for connecting a Session Recording Server. A Gb Ethernet connection might improve performance, but does not result in 10 times greater performance than a 100 Mbps link. In practice, the gain in throughput is significantly less.

Ensure that network switches used by Session Recording are not shared with third-party applications that might compete for available network bandwidth. Ideally, network switches are dedicated for use with the Session Recording Server. If network congestion proves to be the bottleneck, a network upgrade is a relatively inexpensive way to increase the scalability of the system.

Storage

Investment in disk and storage hardware is the single most important factor in server scalability. The faster that data can be written to disk, the higher the performance of the overall system. When selecting a storage solution, take more note of the write performance than the read performance.

Store data on a set of local disks controlled either as RAID by a local disk controller or as a SAN.

Note:

Storing data on a NAS based on file-based protocols such as SMB, CIFS, or NFS has serious performance and security implications. Never use this configuration in a production deployment of Session Recording.

For a local drive setup, aim for a disk controller with built-in cache memory. Caching allows the controller to use elevator sorting during write-back, which minimizes disk head movement and ensures write operations are completed without waiting for the physical disk operation to complete. This can improve write performance significantly at a minimal extra cost. Caching does however raise the problem of data loss after a power failure. To ensure the integrity of data and the file system, consider a battery backup facility for the caching disk controller, which ensures that, if power is lost, the cache is maintained and data is written to disk when power is eventually restored.

Consider using a suitable RAID storage solution. There are many RAID levels available depending on performance and redundancy requirements. The following table specifies each of the RAID levels and how applicable each standard is to Session Recording.

RAID Level Type Minimum Number of Disks Description
RAID 0 Striped set without parity 2 Provides high performance but no redundancy. Loss of any disk destroys the array. This is a low cost solution for storing recorded session files where the impact of data loss is low. Easy to scale up performance by adding more disks.
RAID 1 Mirrored set without parity 2 No performance gain over one disk, making it a relatively expensive solution. Use this solution only if a high level of redundancy is required.
RAID 3 Striped set with dedicated parity 3 Provides high write performance with redundancy characteristics similar to RAID 5. RAID 3 is recommended for video production and live streaming applications. As Session Recording is this type of application, RAID 3 is most highly recommended but it is not common.
RAID 5 Striped set with distributed parity 3 Provides high read performance with redundancy but at the cost of slower write performance. RAID 5 is the most common for general purpose usages. But due to the slow write performance, RAID 5 is not recommended for Session Recording. RAID 3 can be deployed at a similar cost but with significantly better write performance.
RAID 10 Mirrored set and striped set 4 Provides performance characteristics of RAID 0 with redundancy benefits of RAID 1. An expensive solution that is not recommended for Session Recording.

RAID 0 and RAID 3 are the most recommended RAID levels. RAID 1 and RAID 5 are popular standards but are not recommended for Session Recording. RAID 10 does provide some performance benefits but is too expensive for the additional gain.

Decide on the type and specification of disk drives. IDE/ATA drives and external USB or Firewire drives are not suitable for use in Session Recording. The main choice is between SATA and SCSI. SATA drives provide reasonably high transfer rates at a reduced cost per MB compared with SCSI drives. However, SCSI drives provide better performance and are more common in server deployments. Server RAID solutions mostly support SCSI drives but some SATA RAID products are now available. When evaluating the specifications of disk drive products, consider the rotational speed of disk and other performance characteristics.

Because the recording of thousands of sessions per day can consume significant amounts of disk space, you must choose between overall capacity and performance. From the earlier example, recording 5,000 Outlook sessions over an 8-hour work day consumes about 100 GB of storage space. To store 10 days’ worth of recordings (that is, 50,000 recorded session files), you need 1,000 GB (1 TB). This pressure on disk space can be eased by shortening the retention period before archiving or deleting old recordings. If 1 TB of disk space is available, a seven-day retention period is reasonable, ensuring disk space usage remains around 700 GB, with 300 GB remaining as a buffer for busy days. In Session Recording, the archiving and deleting of files is supported with the ICLDB utility and has a minimum retention period of two days. You can schedule a background task to run once a day at some off-peak time. For more information about the ICLDB commands and archiving, see Manage your database records.

The alternative to using local drive and controllers is to use a SAN storage solution based on block-level disk access. To the Session Recording Server, the disk array appears as a local drive. SANs are more expensive to set up, but as the disk array is shared, SANs do have the advantage of simplified and centralized management. There are two main types of SAN: Fibre Channel and iSCSI. iSCSI is essentially SCSI over TCP/IP and is gaining popularity over Fibre Channel since the introduction of Gb Ethernet.

Database scalability

The Session Recording Database requires Microsoft SQL Server 2017, Microsoft SQL Server 2016, Microsoft SQL Server 2014, Microsoft SQL Server 2012, or Microsoft SQL Server 2008 R2. The volume of data sent to the database is small because the database stores only metadata about the recorded sessions. The files of the recorded sessions themselves are written to a separate disk. Typically, each recorded session requires only about 1 KB of space in the database, unless the Session Recording Event API is used to insert searchable events to the session.

The Express Editions of Microsoft SQL Server 2017, Microsoft SQL Server 2016, Microsoft SQL Server 2014, Microsoft SQL Server 2012, and Microsoft SQL Server 2008 R2 impose a database size limitation of 10 GB. At 1 KB per recording session, the database can catalog about 4,000,000 sessions. Other editions of Microsoft SQL Server have no database size restrictions and are limited only by available disk space. As the number of sessions in the database increases, performance of the database and speed of searches diminishes only negligibly.

If you are not making customizations through the Session Recording Event API, each recorded session generates four database transactions: two when recording starts, one when the user logs on to the session being recorded, and one when recording ends. If you use the Session Recording Event API to customize sessions, each searchable event recorded generates one transaction. Because even the most basic database deployment can handle hundreds of transactions per second, the processing load on the database is unlikely to be stressed. The impact is light enough that the Session Recording Database can run on the same SQL Server as other databases, including the Citrix Virtual Apps and Desktops data store database.

If your Session Recording deployment requires many millions of recorded sessions to be cataloged in the database, follow Microsoft guidelines for SQL Server scalability.