Citrix Virtual Apps and Desktops: Zones, Latency, and Brokering Performance
Deployments that span widely dispersed locations connected by a WAN can face network latency and reliability issues. Rather than creating multiple Sites, each of which would require its own SQL Server Site database that you would need to manage separately, you can configure zones within a single Site.
The impact of latency on brokering performance
The majority of end users will enumerate and launch resources every day. With the addition of zones, users can now be on higher-latency links as long as there’s a local broker.
With this additional latency, inevitably there will be an impact on end-user experience. For the majority of work that users will do, they’ll see the expected slowness that’s linked to round trips between the satellite brokers and the SQL database.
However, for launching apps, there is a pain point in actually brokering sessions. This pain point is due to the need to pick the lowest-loaded VDA on which to launch an app. This occurs within a database transaction and needs a snapshot of all the current loads on the VDAs within the Delivery Group. To achieve this, a lock is taken out on all the workers in a Delivery Group, which stops other users (causes serialization) from taking the same locks. It also waits on and blocks out worker state changes (such as session events).
With low latency, the delay between taking the locks and releasing them is very small. However, as latency increases, so does the time the locks are held, and so the time to broker sessions increases.
To back this up, we’ve looked at a variety of latencies and launch rates. The latencies are the round-trip times (RTTs) and were based on Verizon IP Latency Statistics. Note that most of the RTTs are lower than the maximum values listed, but we wanted to make sure that we were testing with some useful RTTs.
Round-trip times of 10 milliseconds (ms) cover most inter-country delays; 45 ms covers North America, Europe, and Japan; 90 ms covers Trans-Atlantic; 160 ms covers Trans Pacific, Latin America, and Asia Pacific; and 250 ms covers EMEA to Asia Pacific.
We tested with a variety of concurrent requests, covering values from 12 to 60 in increments of 12.
Note: the VDA sessions are simulated, as the testing is focused on the impact of latency on the broker. For this testing, there are 57 VDAs within one Delivery Group. Each test attempted to launch 10,000 users.
|10 ms RTT results|
|Average response time(s)||0.9||1.4||1.6||2.1||2.6|
|Brokering requests per second||14||17.8||22.9||23.2||22.9|
|Time to launch 10k users||11m 57s||9m 24s||7m 16s||7m 11s||7m 17s|
As expected, 10 ms is fast enough to handle the loads placed on the system, and there were no errors. This is the fastest way to launch users. At the maximum launch rate of 60 concurrent users, average response times were 2.6 seconds, taking 7 minutes, 17 seconds to launch all 10,000 users.
|45 ms RTT results|
|Average response time(s)||1.7||3.1||4.3||6.4||7.3|
|Brokering requests per second||7.1||7.8||8.4||7.5||8.2|
|Time to launch 10k users||23m 28s||21m 19s||19m 51s||22m 15s||20m 19s|
With 45 ms, results were still good. At the very high launch rates, one or two users saw an error.
Note: The impact of serialization can be seen on the response times, with an increase from 1.7 seconds to 7.3 seconds to broker a session. The total time to broker 10,000 users was 20 to 23 minutes.
|90 ms RTT results|
|Average response time(s)||2.9||6.4||9.5||12.9||16.2|
|Brokering requests per second||4.1||3.7||3.8||3.7||3.7|
|Time to launch 10k users||40m 30s||44m 29s||44m 11s||44m 55s||45m 04s|
The 90 ms results saw few errors. However, the impact of transacting over latency becomes more obvious with users seeing an acceptable average time of 2.9 seconds to broker a session with 12 concurrent requests, increasing to likely unacceptable 16.2 seconds to broker a session with 60 concurrent requests. In this case, it’s actually more advantageous to broker users at a lower rate. It took 40 to 45 minutes to launch all 10,000 users.
|160 ms RTT results|
|Average response time(s)||5.7||11.4||17.3||23.2||28.0|
|Brokering requests per second||2.1||2.1||2.1||2.1||2.1|
|Time to launch 10k users||1 h 19m 0s||1 h 19m 27s||1 h 19m 55s||1 h 20m 26s||N/A|
With the 160 ms, we start to see significant errors occurring with higher launch rates, with 4 percent errors at 48 requests, and 17.7 percent errors at 60 requests, along with response times approaching 30 seconds. However, up to 36 requests, the error rate is 0.1 percent with an average brokering time of 17 seconds.
Note: It’s hard to judge the launch time for 60 requests, as 17 percent failure is hard to factor in.
With this latency, we’d recommend not passing 24 concurrent requests. Also, the size of the Site may be a factor—launching 1,000 users in would take about 8 minutes. This would scale up to 1 hour, 20 minutes for 10,000 users. As such, we wouldn’t recommend a large Site with this level of latency to the database.
|250 ms RTT results|
|Average response time(s)||9.3||15.4||26.7||-||-|
|Brokering requests per second||1.3||1.6||1.3||-||-|
|Time to launch 10k users||2h 8m 33s||1h 46m 52s||2h 3m 46s||N/A||N/A|
With such high latency, a large number of timeouts occurred at higher concurrent launch rates. At 48 requests, 42.8 percent of requests failed. At 60 requests, timeouts were so common that the Site would be unusable, as 99 percent of requests failed. The only acceptable launch rates were 12 and 24 requests. It would be hard to recommend deploying a large Site with this level of latency: launching 1,000 users took 13 minutes with 12 concurrent requests and 11 minutes with 24 concurrent requests. It would take up to 2 hours and 8 minutes for 10,000 users.
If you need to work with high latency and find that too many timeouts occur, a registry key was added to XenApp/XenDesktop 7.7 to allow it to handle only a fixed number of concurrent brokering requests. StoreFront will retry requests above the limit after a few seconds. This will help back off requests, thus reducing lock queuing. However, some users may end up seeing extended launch times, as they’re always unlucky and their request is always backed off.
The key is a DWORD and should be stored in:
If the key doesn’t exist, then no limit on brokering requests is made.
Note: The key is per Delivery Controller, so the total requests on the SQL Server need to be split among the remote Controllers.
Brokering does work over latency, but the latency needs to be considered for sizing a remote zone. If a zone is large, it still may be desirable to keep a database local to that zone. If the zone is small, using a remote zone may work well and also reduce management cost without impacting the end-user experience.
We recommend that your zones have less than 250 ms RTT; beyond that, you should consider setting up different Sites.
This article has been modified from a blog post written by Chris Gilbert. To read the original blog and to see the comments, go to https://www.citrix.com/blogs/2016/02/09/zones-latency-and-brokering-performance-2/.