XenApp and XenDesktop

Audio features

You can configure and add the following Citrix policy settings to a policy that optimizes HDX audio features. For usage details plus relationships and dependencies with other policy settings, see Audio policy settings and Bandwidth policy settings and Multi-stream connections policy settings.


Although it is best to deliver audio using User Datagram Protocol (UDP) rather than TCP, UDP audio encryption using DTLS is available only between NetScaler Gateway and Citrix Receiver. Therefore, sometimes it might be preferable to use TCP transport. TCP supports end-to-end TLS encryption from the VDA to Citrix Receiver.

Audio quality

In general, higher sound quality consumes more bandwidth and server CPU utilization by sending more audio data to user devices. Sound compression allows you to balance sound quality against overall session performance; use Citrix policy settings to configure the compression levels to apply to sound files.

By default, the Audio quality policy setting is set to High - high definition audio when TCP transport is used, and to Medium - optimized-for-speech when UDP transport (recommended) is used. The High Definition audio setting provides high fidelity stereo audio, but consumes more bandwidth than other quality settings. Do not use this audio quality for non-optimized voice chat or video chat applications (such as softphones), because it may introduce latency into the audio path that is not suitable for real-time communications. The optimized for speech policy setting is recommended for real-time audio, regardless of the selected transport protocol.

When bandwidth is limited, for example satellite or dial-up connections, reducing audio quality to Low consumes the least possible bandwidth. In this situation, create separate policies for users on low-bandwidth connections so that users on high-bandwidth connections are not adversely impacted.

For setting details, see Audio policy settings. Remember to enable Client audio settings on the user device; see “Audio setting policies for user devices” later in this article.

Client audio redirection

To allow users to receive audio from an application on a server through speakers or other sound devices (such as headphones) on the user device, leave the Client audio redirection setting at its default (Allowed).

Client audio mapping puts extra load on the servers and the network. However, prohibiting client audio redirection disables all HDX audio functionality.

For setting details see Audio policy settings. Remember to enable client audio settings on the user device; see “Audio setting policies for user devices” later in this article.

Client microphone redirection

To allow users to record audio using input devices such as microphones on the user device leave the Client microphone redirection setting at its default (Allowed).

For security, users are alerted when servers that are not trusted by their user devices try to access microphones, and can choose to accept or reject access prior to using the microphone. Users can disable this alert on Citrix Receiver.

For setting details, see Audio policy settings. Remember to enable Client audio settings on the user device; see “Audio setting policies for user devices” later in this article.

Audio Plug N Play

The Audio Plug N Play policy setting allows or prevents the use of multiple audio devices to record and play sound. This setting is Enabled by default. Audio Plug N Play enables audio devices to be recognized even if they are not plugged in until after the user session has been established.

This setting applies only to Windows Server OS machines.

For setting details, see Audio policy settings.

Audio redirection bandwidth limit and Audio redirection bandwidth limit percent

The Audio redirection bandwidth limit policy setting specifies the maximum bandwidth (in kilobits per second) for a playing and recording audio in a session. The Audio redirection bandwidth limit percent setting specifies the maximum bandwidth for audio redirection as a percentage of the total available bandwidth. By default, zero (no maximum) is specified for both settings. If both settings are configured, the one with the lowest bandwidth limit is used.

For setting details, see Bandwidth policy settings. Remember to enable Client audio settings on the user device; see “Audio setting policies for user devices” later in this article.

Audio over UDP Real-time Transport and Audio UDP port range

By default, Audio over User Datagram Protocol (UDP) Real-time Transport is allowed (when selected at time of installation), opening up a UDP port on the server for connections that use Audio over UDP Real-time Transport. Citrix recommends configuring UDP/RTP for audio, to ensure the best possible user experience in the event of network congestion or packet loss. For real time audio such as softphone applications, UDP audio is now preferred more than EDT. UDP allows for packet loss without retransmission, ensuring that no latency is added on connections with high packet loss.


Audio data transmitted with UDP is not encrypted when NetScaler Gateway is not in the path. If NetScaler Gateway is configured to access XenApp and XenDesktop resources then audio traffic between the endpoint device and NetScaler Gateway is secured using the DTLS protocol.

The Audio UDP port range specifies the range of port numbers that the Virtual Delivery Agent (VDA) uses to exchange audio packet data with the user device.

By default, the range is 16500–16509.

For setting details about Audio over UDP Real-time Transport, see Audio policy settings; for details about Audio UDP port range, see Multi-stream connections policy settings. Remember to enable Client audio settings on the user device; see “Audio setting policies for user devices” later in this article.

Audio setting policies for user devices

  1. Load the group policy templates by following Configuring the Group Policy Object administrative template.
  2. In the Group Policy Editor, expand Administrative Templates > Citrix Components > Citrix Receiver > User Experience.
  3. For Client audio settings, select Not Configured, Enabled, or Disabled.
    • Not Configured. By default Audio Redirection is enabled with high quality audio or previously configured custom audio settings.
    • Enabled. Audio redirection is enabled with selected options.
    • Disabled. Audio redirection is disabled.
  4. If you select Enabled, choose a sound quality. For UDP audio, use Medium (default).
  5. For UDP audio only, select Enable Real-Time Transport and then set the range of incoming ports to open in the local Windows firewall.
  6. To use UDP Audio with NetScaler Gateway, select Allow Real-Time Transport Through gateway. NetScaler Gateway must be configured with DTLS. For more information, see UDP Audio Through a NetScaler Gateway.

As an Administrator, if you do not have control on endpoint devices to make these changes, for example in the case of BYOD or home computers, then use the default.ica attributes from StoreFront to enable UDP Audio.

  1. On the StoreFront machine, open C:\inetpub\wwwroot\Citrix\<Store Name>\App_Data\default.ica with an editor such as notepad.
  2. Make the entries below under the [Application] section.
; This is to enable Real-Time Transport
; This is to Allow Real-Time Transport Through gateway
; This is to set audio quality to Medium
; UDP Port range

If you enable User Datagram Protocol (UDP) audio by editing default.ica, then UDP audio is enabled for all users who are using that store.

Avoid echo during multimedia conferences

Users in audio or video conferences might hear an echo. Echoes usually occur when speakers and microphones are too close to each other. For that reason, we recommend the use of headsets for audio and video conferences.

HDX provides an echo cancellation option (enabled by default) that minimizes echo. The effectiveness of echo cancellation is sensitive to the distance between the speakers and the microphone. Devices can not be too close or too far away from each other.

You can change a registry setting to disable echo cancellation.


Editing the Registry incorrectly can cause serious problems that might require you to reinstall your operating system. Citrix cannot guarantee that problems resulting from the incorrect use of Registry Editor can be solved. Use Registry Editor at your own risk. Be sure to back up the registry before you edit it.

  1. Using the Registry Editor on the user device, navigate to one of the following:
    • 32-bit computers: HKEY_LOCAL_MACHINE\SOFTWARE\Citrix\ICA Client\Engine\Configuration\Advanced\Modules\ClientAudio\EchoCancellation
    • 64-bit computers: HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Citrix\ICA Client\Engine\Configuration\Advanced\Modules\ClientAudio\EchoCancellation
  2. Change the Value data field to FALSE.


A softphone is software acting as a phone interface. You use a softphone to make calls over the internet from a computer or other smart device. By using a softphone, you can dial phone numbers and carry out other phone-related functions using a screen.

XenApp and XenDesktop support several alternatives for delivering softphones.

  • Control mode. The hosted softphone simply controls a physical telephone set. In this mode, no audio traffic goes through the XenApp or XenDesktop server.
  • HDX RealTime optimized softphone support. The media engine runs on user device, and Voice over Internet Protocol (VoIP) traffic flows peer-to-peer. For examples, see:
  • Local App Access. A XenApp and XenDesktop feature that allows an application such as a softphone to run locally on the end user Windows device yet appear seamlessly integrated with their virtual/published desktop. This offloads all audio processing to the user device. For more information, see Local App Access and URL redirection.
  • HDX RealTime generic softphone support. VoIP-over-ICA.

Generic softphone support

Generic softphone support, enables you to host an unmodified softphone on XenApp or XenDesktop in the data center. The audio traffic goes over the Citrix ICA protocol (preferably using UDP/RTP) to the user device running the Citrix Receiver.

Generic softphone support is a feature of HDX RealTime. This approach to softphone delivery is especially useful when:

  • An optimized solution for delivering the softphone is not available and the user is not on a Windows device where Local App Access can be used.
  • The media engine needed for optimized delivery of the softphone has not been installed on the user device or is not available for the operating system version running on the user device. In this scenario, Generic HDX RealTime provides a valuable fallback solution.

There are two softphone delivery considerations using XenApp and XenDesktop:

  • How the softphone application is delivered to the virtual/published desktop.
  • How the audio is delivered to and from the end user headset, microphone, and speakers, or USB telephone set.

XenApp and XenDesktop include numerous technologies to support generic softphone delivery:

  • Optimized-for-Speech codec for fast encode of real-time audio and bandwidth efficiency.
  • Low latency audio stack.
  • Server-side jitter buffer to smooth out the audio when network latency fluctuates.
  • Packet tagging (DSCP and WMM) for Quality of Service.
    • DSCP tagging for RTP packets (Layer 3)
    • WMM tagging for Wi-Fi

The Citrix Receiver versions for Windows, Linux, Chrome, and Mac also are VoIP capable. Citrix Receiver for Windows offers these features:

  • Client-side jitter buffer - Ensures smooth audio even when network latency fluctuates.
  • Echo cancellation - Allows for greater variation in the distance between microphone and speakers for workers who do not use a headset.
  • Audio plug-n-play - Audio devices do not need to be plugged in before starting a session. They can be plugged in at any time.
  • Audio device routing - Users can direct ringtone to speakers but the voice path to their headset.
  • Multi-stream ICA - Enables flexible Quality of Service (QoS)-based routing over the network.
  • ICA supports four TCP and two UDP streams. One of the UDP streams supports real-time audio over RTP.

For a summary of Citrix Receiver capabilities, see Citrix Receiver Feature Matrix.

System configuration recommendations

Client Hardware and Software: For optimal audio quality, we recommend the latest version of Citrix Receiver and a good quality headset with acoustic echo cancellation (AEC). Citrix Receiver versions for Windows, Linux, and Mac support VoIP. Also, Dell Wyse offers VoIP support for ThinOS (WTOS).

CPU Considerations: Monitor CPU usage on the VDA to determine if it is necessary to assign two virtual CPUs to each virtual machine. Real-time voice and video are data intensive. Configuring two virtual CPUs reduces the thread switching latency. Therefore, we recommend that you configure two vCPUs in a XenDesktop VDI environment.

Having two virtual CPUs does not necessarily mean doubling the number of physical CPUs, because physical CPUs can be shared across sessions.

Citrix Gateway Protocol (CGP), which is used for the Session Reliability feature, also increases CPU consumption. On high-quality network connections, you can disable this feature to reduce CPU consumption on the VDA. Neither of the preceding steps might be necessary on a powerful server.

UDP Audio: Audio over UDP provides excellent tolerance of network congestion and packet loss. We recommend it instead of TCP when available.

LAN/WAN configuration: Proper configuration of the network is critical for good real-time audio quality. Typically, you must configure virtual LANs (VLANs) because excessive broadcast packets can introduce jitter. IPv6-enabled devices might generate many broadcast packets. If IPv6 support is not needed, you can disable IPv6 on those devices. Configure to support Quality of Service.

Settings for use WAN connections: You can use voice chat over Local Area Network (LAN) and Wide Area Network (WAN) connections. On a WAN connection, audio quality depends on the latency, packet loss, and jitter on the connection. If delivering softphones to users on a WAN connection, we recommend using the Citrix SD-WAN between the data center and the remote office to maintain a high Quality-of-Service. Citrix SD-WAN supports Multi-Stream ICA, including UDP. Also, in the case of a single TCP stream, it is possible to distinguish the priorities of various ICA virtual channels to ensure that high priority real-time audio data gets preferential treatment.

With Direct Workload Connection, audio-over-UDP can be encrypted using Citrix SD-WAN after authentication through the Gateway.

Use Director or the HDX Monitor to validate your HDX configuration.

Remote user connections: NetScaler Gateway 11 supports DTLS to deliver UDP/RTP traffic natively (without encapsulation in TCP). You must open firewalls bidirectionally for UDP traffic over Port 443.

Codec selection and bandwidth consumption: Between the user device and the Virtual Delivery Agent (VDA) in the data center, we recommend using the Optimized-for-Speech codec setting, also known as Medium Quality audio. Between the VDA platform and the IP-PBX, the softphone uses whatever codec is configured or negotiated. For example:

  • G711 provides better voice quality but has a bandwidth requirement of 80–100 kilobits per second per call (depending on Network Layer2 overheads).
  • G729 provides good voice quality and has a low bandwidth requirement of 30–40 kilobits per second per call (depending on Network Layer 2 overheads).

Delivering softphone applications to the virtual desktop

There are two methods by which you can deliver a softphone to the XenDesktop virtual desktop:

  • The application can be installed in the virtual desktop image.
  • The application can be streamed to the virtual desktop using Microsoft App‑V. This approach has manageability advantages because the virtual desktop image is kept uncluttered. After being streamed to the virtual desktop, the application runs in that environment as if it had been installed in the usual manner. Not all applications are compatible with App-V.

Delivering audio to and from the user device

Generic HDX RealTime supports two methods of delivering audio to and from the user device:

  • Citrix Audio Virtual Channel. We generally recommend the Citrix Audio Virtual Channel because it’s designed specifically for audio transport.
  • Generic USB Redirection. Useful to support audio devices having buttons and/or a display, human interface device (HID), if the user device is on a LAN or LAN-like connection back to the XenApp or XenDesktop server.

Citrix audio virtual channel

The bidirectional Citrix Audio Virtual Channel (CTXCAM) enables audio to be delivered efficiently over the network. Generic HDX RealTime takes the audio from the user headset or microphone, compresses it, and sends it over ICA to the softphone application on the virtual desktop. Likewise, the audio output of the softphone is compressed and sent in the other direction to the user headset or speakers. This compression is independent of the compression used by the softphone itself (such as G.729 or G.711). It is done using the Optimized-for-Speech codec (Medium Quality). Its characteristics are ideal for voice-over-IP (VoIP). It features quick encode time, and it consumes only approximately 56 Kilobits per second of network bandwidth (28 Kbps in each direction), peak. This codec must be explicitly selected in the Studio console because it is not the default audio codec. The default is the HD Audio codec (High Quality). This codec is excellent for high fidelity stereophonic soundtracks but is slower to encode compared to the Optimized-for-Speech codec.

Generic USB Redirection

Citrix Generic USB Redirection technology (CTXGUSB virtual channel) provides a generic means of remoting USB devices, including composite devices (audio plus HID) and isochronous USB devices. This approach is limited to LAN-connected users because the USB protocol tends to be sensitive to network latency and requires considerable network bandwidth. Isochronous USB redirection works well when using some softphones. This redirection provides excellent voice quality and low latency, but Citrix Audio Virtual Channel is preferred because it is optimized for audio traffic. The primary exception is when using an audio device with buttons such as a USB telephone attached to the user device that is LAN-connected to the data center. In this case, Generic USB Redirection supports buttons on the phone set or headset that control features by sending a signal back to the softphone. This isn’t an issue with buttons that work locally on the device.