Arctera

LLT enhancement for improved cluster resiliency

If the operating system does not schedule LLT timer context on time, heartbeats from some of the cluster nodes may be lost, and those nodes may get fenced out. This situation typically occurs when the CPU load or the memory usage is high or, when the disk snapshot or the VM migration operations are in progress in virtualized environments.

The LLT module is now enhanced to make clusters more resilient to transient issues by heartbeating using threads that are bound to every CPU when the timer context is not scheduled by the OS. This feature is enabled by default, but a tuneable is provided to disable or enable it later.

This feature is primarily useful in virtualized environments, but it provides the same kind of resiliency in physical environments as well.

For details, refer to the Cluster Server Administrator’s Guide.

LLT enhancement for improved cluster resiliency

In this article