How is traffic distributed across the WAN links of a Socket in Active/Active mode?
Traffic will be distributed across the WAN links based on health metrics, link preference, and the propotional ratio of the configured bandwidths for each link. The distribution is flow-based rather than packet-based.
A flow is a conversation between a client and a server in which all the packets share the same 5-tuple:
-
Source IP address
-
Source port
-
Protocol
-
Destination IP address
-
Destination port
Example Flow
A user with IP address 192.168.1.20 connects to a mail server with IP address 10.10.10.2 to send an email. A flow is created with the following 5-tuple:
-
Source IP: 192.168.1.20
-
Source port: 34579
-
Protocol: TCP
-
Destination IP: 10.10.10.2
-
Destination port: 25
How Distribution Works
Each flow is distributed in the following order:
-
Every second, Cato evaluates the link quality of each active link by calculating a score based on health metrics that include packet loss, jitter, and latency. The following are the minimal link quality metrics, which are configurable under the Site's Advanced Configuration:
-
Packet loss - 3%
-
Jitter - 30 ms
-
Latency - 600 ms
If there is no internet connectivity, the link will not be selected for traffic distribution. Also, if a link can't meet one of the above metrics, it receives a lower priority and is less likely to be used for traffic distribution.
Cato constantly checks that the link quality of the interface chosen is still healthy (based on the 1-second checkup results). If it is healthy, packets continue to flow through the link. If the link is not healthy, Cato immediately does a new link selection for each flow and chooses the best link for the traffic.
-
-
If all active links satisfy link quality metrics, traffic distribution follows the preferred interface selected in the network rule.
-
If there's no preferred interface (automatic role) in the network rule, Cato considers the active/active bandwidth configuration from CMA.
In the example below, the WAN1 link is configured with a bandwidth of 100 Mbps down/up, and the WAN2 link is configured with 20 Mbps down/up.
In this configuration, the WAN1:WAN2 bandwidth ratio is 100:20 or 5:1 for both upstream and downstream traffic. Therefore, five flows will be sent to WAN1 for every one flow that is sent to WAN2.
The Socket and the PoP that it's connected to share the duties of flow distribution. The Socket itself only takes the upstream bandwidth into consideration because it only controls the traffic sent from the Socket to the PoP (upstream traffic). The PoP takes the downstream bandwidth of the Socket into consideration because it controls the traffic sent from the PoP to the Socket (downstream traffic).
Failover
If one of the WAN links goes down, packets belonging to flows that were assigned to that link will be sent to the WAN link that is still up. This ensures that any existing flows will not be discarded, although a slight, temporary disruption may be observed for connection-sensitive applications like video conferencing during the failover process.
3 comments
Is my conclusion correct that traffic distribution does not take into account the latency/jitter/packetloss of the links but only the bandwidth usage?
If so, I think it is good to mention this.
Hi Gerwin,
This has been my experience so far. We are currently evaluating Cato and I've set up a test rig like this. Still experimenting but Cato doesn’t seem to prefer the lower latency link. Proportional allocation is quite a crude way of doing things. Hopefully we'll see some improvements in the future. E.g. keep track of the quality of the links and use the appropriate link based on the traffic type. Ideally there might be some tuneable parameters. E.g. prefer this link unless it experiences contention or latency increases to x.
I'm going to attend a Cato tech session, so I'll ask there and try to remeber to report back here.
Ngā mihi
Rhys
This article describes things better and it looks like there are tuneable link quality options:
Part 1: The Socket Interfaces and Precedence – Cato Knowledge Base (catonetworks.com)
Importantly: "[In active/passive] despite the fact that only one link is active, the two links maintain live DTLS tunnels to the PoPs to provide a faster recovery time during failover. The passive link sends keepalive messages to maintain a live tunnel with the PoP. " So not passive in the traditional fail over sense. It's really active/active connected but active/passive usage. The secondly is always up and at the ready.
Please sign in to leave a comment.