Socket HA Keepalive or Split-Brain - Network Playbook

This playbook describes steps to resolve issues for Socket High Availability (HA) sites with keepalive issues or a split-brain condition.

Overview

A split-brain condition is when both Sockets have the Master role at the same time. This can happen due to a LAN connectivity problem between the Sockets that creates a situation where the HA keepalive messages do not reach the secondary Socket.

After the LAN connectivity issue is resolved, the secondary Socket identifies that the primary Socket is the Master and the secondary Socket returns to Stand-by status.

For more information about split-brain condition, see What is Socket High Availability (HA).

This playbook contains steps you can take to:

  1. Verify that the Sockets have a split-brain condition or HA keepalive issue.

  2. Remediate the issue.

  3. Verify that the secondary Socket has the Standby role.

These are the different ways that a Cato Management Application admin can verify that the Sockets have an issue related to Socket HA:

  • When the Cato Cloud identifies a Socket HA with a split-brain condition, a HA Status is not ready story is generated in the Stories Workbench

  • You can also use the Sites page to sort the sites according to the HA Status.

    HA_Status_sites.png

Step 1 - Verifying that there is a Split-Brain Condition

You can verify a split-brain condition with the Socket page for the site in the Cato Management Application, and the HA status in the Socket WebUI for each Socket.

Reviewing HA Status in the Socket Page

Use the Socket page for the HA site to verify that the site has a split-brain condition.

  • The primary and secondary Sockets will be shown as status Master

  • The Keepalive condition will be shown as Failed and this causes the HA Status to be shown as NOT READY

Reviewing HA Status in the Socket WebUI

Connect to the Socket WebUI for each Socket and click the Status tab to verify the High availability state:

  • Primary Socket is in the state VRRP_STATE_MASTER

  • Secondary Socket is in the state VRRP_STATE_BACKUP (see example below)

SocketWebUI-HA2.png

If the secondary Socket is in VRRP_STATE_MASTER, that indicates a split-brain condition.

Step 2 - Remediating the Split-Brain Condition

Once you identify a keepalive or split-brain issue, these are steps that you can take to remediate the issue:

  1. Check for any recent changes in the switch or LAN side configuration, like VLAN config or changes in the physical cabling. Make sure that your configuration follows the best practices for Socket HA described in What is Socket High Availability (HA).

  2. Run a packet capture on the LAN Interface following this procedure (How to Take a Packet Capture on a Socket ) and check if the master Socket is sending VRRP keepalive messages to the standby Socket.

  3. After performing all the checks above and the issue is still not resolved, please contact Support and provide all necessary information.

Step 3 - Verifying that the Split-Brain Condition is Resolved

After the secondary Socket returns to the Standby state, go to the Socket page and confirm that the HA Status is Ready.

Was this article helpful?

0 comments

Add your comment