This topic covers the High Availaiblity (HA) deployments and configurations supported by SD-WAN appliances (Standard Edition and Enterprise Edition).
SD-WAN appliances can be deployed in HA configuration as a pair of appliances in Active/Standby roles. There are three modes of HA deployment:
- Parallel Inline HA
- Fail-to-Wire HA
- One-Arm HA
These HA deployment modes are similar to Virtual Router Redundancy Protocol (VRRP) and use a proprietary SD-WAN protocol. Both Client Nodes (Clients) and Master Control Nodes (MCNs) within a SD-WAN network can be deployed in an HA configuration as long as the selected SD-WAN platform model supports HA.
In HA configuration, one SD-WAN appliance at the site is designated as the Active appliance and is continuously monitored by the Standby appliance. Configuration is mirrored across both appliances. If the Standby appliance loses connectivity with the Active appliance for a defined period, the Standby appliance assumes the identity of the Active appliance and takes over the traffic load. Depending on the deployment mode, this fast failover has minimal impact on the application traffic passing through the network.
For MCNs, secondary MCN redundancy is supported. In this mode, one of the Clients is also designated as a Secondary MCN. It will continuously monitor the health of the Primary MCN and, if a catastrophic event occurs, it will assume the role of the MCN.
Configuring High Availability
To configure HA:
1. Navigate to the SD-WAN web management interface at: Configuration > Virtual WAN > Configuration Editor > Sites (MCN) > DC. Click Enable High Availability.
2. After a site is configured, the HA appliance and interface groups are configured.
- Primary Reclaim: In the event that the Active appliance fails and then comes back up, it can be configured to reclaim the Active status after it is rebooted. This feature is disabled by default. To enable it, select the check box for Primary Reclaim in the HA section of the configuration for the site. The Active/Standby states of a HA pair can be manually switched from the web console of either appliance during run-time operation.
- Fail-to-Wire: Select the Fail-to-Wire check box.
3. Configure interface groups by clicking the + next to HA IP Interfaces. From the Virtual Interface drop-down menu, select the desired interface. This interface monitors the Active appliance for reachability. For One-Arm HA mode, only one interface group is required.
4. Select the Primary and Secondary IP address.
5. For Inline HA mode, additional interface groups are required for External Tracking to monitor the upstream or downstream network infrastructure. For example; switch port failure, to detect if HA change state is required.
To monitor HA configuration:
Login to the SD-WAN web management interface for the Active and Standby MCN appliance's for which high-availability is implemented. View high-availaiblity status under the Dashboard tab.
For Network Adapter details of Active and Standby HA appliances, navigate to Configuration > Appliance Settings > Network Adapters > Ethernet tab.
Selecting a High Availability Mode
In One-Arm mode, the HA appliance pair is outside of the data path. Application traffic is redirected to the appliance pair by using Policy Based Routing (PBR). One-Arm mode is implemented when a single insertion point in the network is not feasible or to counter challenges of fail-to-wire. In the following illustration, the Standby appliance can be added to the same VLAN or subnet as the Active appliance and the router.
In One-Arm mode, it is recommended that the SD-WAN appliances do not reside in the data network subnets. The virtual path traffic does not have to traverse the PBR and avoids route loops. The SD-WAN appliance and router have to be directly connected, either through an Ethernet port or be in the same VLAN.
IP SLA Monitoring for Fall Back
The active traffic will flow even if the virtual path is down, as long as one of the SD-WAN appliances is active. The SD-WAN appliance redirects traffic back to the router as Intranet traffic. However, if both active/standby SD-WAN appliances become inactive, the router will try to redirect traffic to the appliances. IP SLA monitoring can be configured at the router to disable PBR, if the next appliance is not reachable. This allows the router to fall back to perform a route lookup and forward packets appropriately.
Parallel Inline HA mode:
In Parallel Inline HA mode, the SD-WAN appliances are deployed alongside each other, inline with the data path. Only one path through the Active appliance is used. It is important to note that bypass interface groups are configured to be fail-to-block and not fail-to-wire so that you don't get bridging loops during a failover.
The HA state can be monitored through the inline interface groups, or through a direct connection between the appliances. External Tracking can be used to monitor the reachability of the upstream or downstream network infrastructure. For example; switch port failure) to direct HA state change, if needed.
If both active and standby SD-WAN appliances are disabled or fail, a tertiary path can be used directly between the switch and router. This path must have a higher spanning tree cost than the SD-WAN paths so that it is not used under normal conditions. Failover in parallel inline HA mode is very quick and nearly hitless, as no physical state change occurs. Fallback to the tertiary path is not hitless and can block traffic for 5-30 seconds depending on the spanning tree configuration. If there are out of path connections to other WAN Links, both appliances must be connected to them.
In more complex scenarios, where multiple routers might be using VRRP, non-routable VLANs are recommended to ensure the LAN side switch and routers are reachable at layer 2.
In fail-to-wire mode, the SD-WAN appliances are inline in the same data path. The bypass interface groups should be in the fail-to-wire mode with the Standby appliance in a passthrough or bypass state. A direct connection between the two appliances on a seperate port must be configured and used for the HA interface group.
- HA switchover in fail-to-wire mode takes longer period, approximately 10-12 seconds due to delay in ports to recover from Fail-to-Wire state.
- If the HA connection between the appliances fails, both appliances will go into Active state and cause a service interruption. This can be mitigated by assigning multiple HA connections so that there is no single point of failure.
- It is imperative that in HA Fail-to-Wire Mode, a separate port be used in the hardware appliance pairs for HA control exchange mechanism to assit in state convergence.
- Due to a physical state change when the SD-WAN appliances switch over from Active to Standby, failover can cause partial loss of connectivity depending on how long the auto-negotiation takes on the Ethernet ports.
- It is recommended that Fail-to-Wire mode be used on ports that are auto‐negotiated, as this will increase failover time.
The following illustration shows an example of the Fail-to-Wire deployment.
The One-Arm HA configuration or Parallel Inline HA configuration is recommended for Datacenters or Sites that forward a high volume of traffic to minimize disruption during failover.
If minimal loss of service is acceptable during a failover then Fail-to-Wire HA mode is a better solution. The Fail-to-Wire HA mode protects against appliance failure and parallel inline HA protects against all failures. In all scenarios, HA is valuable to preserve the continuity of SD-WAN network during a system failure.