This article discusses the WAN Recovery feature for Socket sites that provides resiliency in the very unlikely circumstance that there is a connectivity issue with the Cato Cloud.
The WAN Recovery feature is one of multiple recovery options that provide resiliency if your Socket sites can't communicate using the Cato Cloud. WAN Recovery uses VPN tunnels between Socket sites over the Internet to preserve the connectivity for the WAN traffic between your sites if there is a connectivity issue with the Cato Cloud.
WAN Recovery is based on a full mesh topology and is enabled by default for all Socket sites. Each Socket creates a direct DTLS tunnel to every other one over the public Internet. They regularly send keep-alive messages over the tunnel and keep an open live tunnel to reduce the recovery time. This topology provides maximum resiliency for the Socket sites in your account.
The following diagram shows an example where one Socket is disconnected from the Cato Cloud. WAN Recovery is enabled for that site to provide a direct connection between the two Sockets:
To ensure the smoothest transition for sites to WAN Recovery, you can use a static IP for the site and define the Socket interface Public IP and Static Port settings for a site to improve establishing the off-cloud tunnels between the sites.
For accounts where it is difficult to configure the static IP settings for all the Sockets, we recommend that you use static IP settings for a few key sites, such as data centers, that act as hubs for WAN Recovery. The IP address for the hub sites is sent to the PoPs and propagated to the other Sockets in your account that are configured for WAN Recovery.
The Socket keeps an open tunnel for WAN Recovery, so if it loses connectivity with the Cato Cloud, the Socket recovers the connections with the other sites and minimizes the disconnection time. The Socket then immediately starts sending the WAN traffic over the WAN recovery link.
You can use the Cato Management Application to disable the WAN Recovery either for a specific site or for the entire account. For more information, see Working with Advanced Configuration for the Account.
Once connectivity to the Cato Cloud is restored, recovery ends and the traffic is sent over the Cato Cloud.
The Socket page for a site shows the Off Cloud Status for the WAN links. When the status is Enabled, the links are ready for WAN Recovery.
We recommend that you use static IP addresses for key sites, such as data centers, that act as hubs for WAN Recovery. Define the off-cloud Public IP and Static Port for each WAN link in the hub sites.
You can use the Monitoring > Best Practices page to confirm that all sites are enabled in the Advanced Configuration settings to support WAN Recovery.
To configure a site for WAN Recovery:
-
From the navigation menu, select Network > Sites, and select the site.
-
From the navigation menu, select Site Configuration > Socket.
-
Configure the WAN link for WAN Recovery:
-
Click the WAN link. The Edit Socket Interface panel opens.
-
Set the Traffic Status to Enabled.
-
(Optional) Define the static Public IP and Static Port for the link. We recommend this setting for key hub sites.
-
-
Repeat step 3 for all Socket WAN links.
-
Click Apply, and then click Save.
The site is configured for WAN Recovery.
The CMA generates the following events for WAN recovery:
-
Off-Cloud Recovery Activated – this event is generated when the Socket starts to send the WAN traffic over the WAN Recovery transport.
-
Off-Cloud Recovery Stopped – this event is generated when the connection to the Cato Cloud is restored and the Socket stops sending WAN traffic over the WAN Recovery transport.
WAN Recovery is enabled by default for all Socket sites to provide resiliency using off-cloud traffic, if it is disabled for one or more sites, then they can't communicate with the other. For example, if WAN recovery is enabled on sites A and B, but not for site C, during the recovery, site C can't communicate with the other sites, and sites A and B can't communicate with site C.
The LAN Firewall policy is not impacted and continues to function normally during WAN Recovery because the Socket applies the policy.
Note
Note: Due to regulatory reasons, WAN Recovery is not supported in China.
During WAN Recovery, make sure that you do NOT reboot the Socket, otherwise, there can be a negative impact to the site and it might not be able to re-establish connectivity with the other sites.
For all deployments, when WAN Recovery is enabled, each Socket establishes secure DTLS tunnels to the remote Socket site on all WAN interfaces that are enabled for off-cloud traffic. For active/active link configuration, the Socket randomly selects one of the active links for WAN recovery. For active/passive, the Socket uses the active link.
The Cato Management Application (CMA) does not receive all site data because it is not connected to the PoP and is not aware of the status of the impacted sites.
You can log in to the Socket WebUI and use the SD-WAN tab to monitor traffic and off-cloud tunnels. This is an example of the monitoring traffic with the Socket WebUI:
Traffic that is passed over the WAN Recovery off-cloud transport isn’t processed by PoPs in the Cato Cloud. This means that during WAN Recovery, the PoP services are not applied to traffic, including the following items:
-
Security
-
WAN and Internet firewall policies
-
Threat Prevention services (ie. IPS, Anti-Malware)
-
-
Networking
-
NAT policy
-
Complex Network Rules
-
DNS Forwarding
-
DHCP Relay
-
Static Range Translation (SRT)
-
-
Access
-
Client Access (ie. Client Connectivity policy)
-
Device Posture
-
For accounts that enable recovery via Alt. WAN (ie. MPLS), if the Socket disconnects from the Cato Cloud, the Alt. WAN link has a higher priority than WAN Recovery. Therefore, the Socket first moves the traffic to the Alt. WAN link. If the Alt. WAN link is unavailable, the Socket then moves the WAN traffic to the WAN Recovery link. Generally, the WAN Recovery has the lowest priority as a transport option, and it’s only used when the other transport options are unavailable.
WAN Recovery relies on NAT punching to establish the WAN connectivity between your sites. When a Socket connects to the Cato Cloud, the PoP informs the Socket on all the other endpoints, and the Socket opens a DTLS tunnel to each one of them. The Socket uses the NAT punching technique to establish a direct connection with the other Sockets.
Note: The negotiation of the NAT punching starts over the Cato Cloud. Therefore, the Sockets must be connected to the Cato Cloud to allow the NAT punching.
The following diagram shows the flow to establish a direct connection between two Sockets for WAN Recovery:
The NAT punching technique works for each pair of Sockets in the following way:
-
The PoP selects one of the Sockets as the initiator to establish a direct connection (Socket 1) based on the site ID (the site with the highest ID value is the initiator).
-
The initiator Socket sends a request to the Cato Cloud for the following details: IP address and port number, for example: IP address 82.128.1.1 and port number 4444 (Step #2)
-
The Cato PoP sends the source IP address and port to Socket 1
-
Socket 1 sends its IP address and port to Socket 2 over the Cato tunnel
-
Socket 2 sends a request to the Cato Cloud for the following details: IP address and port
-
The Cato PoP sends the source IP address and port to Socket 2
-
Socket 2 sends its IP address and port to Socket 1 over the Cato tunnel
-
Socket 1 sends 32 packets to Socket 2 in the range of the source port, each packet with a different port number
-
Socket 2 sends 32 packets to Socket 1 in the range of the source port, each packet with a different port number
-
Once the correct port is found, the Sockets open a DTLS tunnel with the source IP address and the port number
When Socket 2 connects with Socket 1, the router adds the NAT entry to its routing table
-
From that point on, the Sockets send keep-alive messages every 15 seconds to keep the connection open
After NAT punching succeeds, the Socket saves this NAT data. In the case of a Socket restart, it can immediately reconnect to the other Sockets with that NAT data. Saving the NAT data significantly reduces the Socket reconnection time. For Sockets that are behind a network firewall or a router, if your firewall or router restarts, the NAT entries are changed. The NAT data is no longer relevant, and the Sockets must perform the NAT punching process again.
4 comments
Missing information is the dynamic port range that Cato is using during setup of the tunnel.
Hi,
Why isn't Cato allowing to choose between sites and sdp clients to activate the security features like IPS, NGFW, antimalware or not. Now we have to choose those features on all sites and sdp clients. This way we can't even evaluate and compare Cato with other security tooling we have already in place.
Kind regards,
Jan
Hello Bert-Jan!
My apologies that neither of your comments have been responded to yet! I will get some answers for you and, if required, ensure the KB article is updated accordingly.
Kind Regards,
Dermot Doran (Cato Community Manager)
Hi, Cato Team,
Thank you for a useful article. Let me ask you one question.
How do DNS and DHCP features work when the WAN Recovery is working?
[Scenario]
- Normal
・Account's DNS setting is Primary: 10.254.254.1, Secondary: 8.8.8.8.
・To resolve the internal domain, the DNS forwarding setting specifies the DNS server under the site as the forwarding destination.
- While the recovery function is running
・As a DHCP server, is the value distributed by Socket different from normal time?
・As a DNS server, how does Socket provide the function?
Regards,
Yoshihiro Toyomasu
Please sign in to leave a comment.