Cato Networks Knowledge Base

How to Troubleshoot Packet Loss

  • Updated

Troubleshooting packet loss can be a daunting task. Packets can traverse a dozen or more routers over thousands of miles of cable before reaching the destination, and packet loss can occur at any point along the network path. In addition, there are numerous reasons that packets can be dropped along the way. A few common ones are:

  • Link congestion
  • Misconfiguration (bandwidth settings or NIC speed and duplex)
  • Hardware failures
  • High CPU on a network device

Determining the source of packet loss and why it is occurring are not always easy. Packets pass through multiple networks owned by different ISPs and organization over the Internet, and you don’t have access to every router in the path to check things like the link state and CPU load. 

Fortunately, Cato provides a number of tools in both the Socket and the PoPs that help you troubleshoot packet loss. This guide discusses the available tools and how to use them to troubleshoot packet loss whether it’s over the Internet, WAN, or on the local network itself.

Using Cato Analytics to Show Packet Loss 

A good way to start troubleshooting packet loss is to use the Analytics page on Cato Management Application. The Analytics page in the Cato Management Application contains a wealth of useful information and is a great place to get started when troubleshooting packet loss. The Packet Loss graphs on the Analytics > Sites pages show packet loss over time and let you focus on specific timeframes. These graphs are useful to identify if packet loss is occurring and when it occurred in the past. In addition, you can identify the type of packet loss, provider loss or Cato discarded.

1. Provider loss: This is an example of packet loss between the Socket and the PoP. Although most provider packet loss is caused by network connectivity issues on the last mile outside of Cato’s control, it doesn’t necessarily exclude a Cato related problem.

1._Provider_Loss.png

How Cato Measures Provider Loss 

Provider loss is measured by comparing a count of how many packets are sent and how many packets are received over a given link on both the Socket and the PoP.

  • Downstream packet loss is the difference between the number of packets sent by the PoP and the number of packets received by the Socket expressed as a percentage.

Formula:

mceclip1.png

  • Upstream packet loss is the difference between the number of packets sent by the Socket and the number of packets received by the PoP.

Formula:

mceclip2.png

The way that provider packet loss is calculated means that as easy as it may be, you can’t put all the blame on the ISP right away. In case you have equipment between the Socket and the ISP router that contributes to packet loss, or there may be problems with the network path closer to the PoP that is beyond the control of the ISP.

2. Cato discarded: packet loss caused by Cato QoS. The QoS engine starts to discard low-priority packets when a link is congested. Congestion occurs when the total throughput over a link matches the configured bandwidth for the link. Cato also discards packets if a BW management priority is configured with a hard throughput limit and traffic matching that priority hits the limit. Cato discarded packet loss is expected behavior and not necessarily a sign of a problem.

Any issues related to Cato discard packet loss are likely caused by a misconfiguration. Critical applications like VoIP should be given the highest BW management priority. If congestion occurs, low priority traffic is dropped by Cato, but high priority traffic isn’t dropped. Always make sure that appropriate BW management priorities are assigned to traffic.

Analytics provides a broad view of packet loss. However, unless you’re dealing with Cato discarded packet loss, Analytics alone can’t tell you what is causing the packet loss or where the packet loss is occurring. You also need to run a few additional tests that depend on the deployment type as well as the reported problem. The troubleshooting steps for specific packet loss scenarios are provided below, but first we’ll go over the tools at your disposal.

Tools for Troubleshooting Packet Loss

Anyone troubleshooting packet loss needs to be familiar with the following utilities:

  • Ping
  • Traceroute (tracert on Windows)

Ping

Ping is by far one of the most widely used and useful troubleshooting tools. It’s installed by default on every operating system and allows you to send ICMP packets to a remote IP address at a set interval. A fantastic help and ideal for packet loss troubleshooting. When some of the ping requests do not arrive to their destination then you are probably experiencing packet loss and it will be shown as request timeout.

You can run ping commands from the Socket UI. You can access the ping utility from the Socket UI on the Tools page under the Network Tools section.

Ping on the Socket UI

The Socket UI allows you to ping by hostname or IP as well as select the interface that you want to send the ping over. The “Cato” interface sends the ping over the Cato DTLS tunnel. If you select any other interface the ping bypasses the PoP completely.

The UI only sends 10 ping requests, so if you need more pings you will need to click the Ping button again. 

Ping.png

Traceroute

Traceroute is used to identify the routers (hops) between a source and destination. Traceroute works by first sending a packet to the destination IP with a time-to-live (TTL) value of 1. When the packet hits the first router in the path, the router decrements the TTL by 1. Since the new TTL is 0, the router drops the packet and responds with an ICMP time-to-live exceeded message sourced from its own IP address. Traceroute now has the IP address of the first router in the path, so it sends out another packet with the TTL value of 2 (the original TTL incremented by 1), and the second router in the path responds with a TTL exceeded message. This process is repeated until traceroute reaches the destination IP address. Traceroute shows the %Loss as the packet loss percentage.

Traceroute can be run from the Socket UI. The UI only sends one packet for each hop, it does show packet loss for each hop. Since there’s only one packet being sent, you’ll only see 0% or 100% loss.

Limitations with Troubleshooting Packet Loss Using traceroute 

Be aware that packet loss shown at any single hop is not necessarily a sign of a problem. A single hop could show 100% packet loss simply because ICMP is not enabled on the router. A hop can also show less than 100% packet loss without there being a problem due to ICMP rate limiting, a common security measure employed on routers to prevent DoS attacks on the internet. If you see some packet loss on one hop and 0% packet loss on the next hop, you’re likely witnessing ICMP rate limiting.

If there is actual problem with packet loss, it starts with the one hop with packet loss and then continues for multiple hops with each hop showing packet loss. It’s also possible that multiple routers on a path are contributing to packet loss, so the amount of packet loss can vary at each hop. For example, there are eight hops in the route and traceroute shows packet loss for hops 3-7.

Traceroute on the Socket UI

You access traceroute on the UI on the Tools page under the Network Tools section.

Traceroute.png

Putting the Tools to Use: How to Troubleshoot Packet Loss

1. Determining the Scope of Packet Loss

When you start, it’s really important to find out who or what is experiencing the packet loss. Is every user experiencing packet loss at a site, or is it isolated to a single endpoint? Does the packet loss occur over the Internet or over the WAN? Are multiple sites affected by packet loss, or just one? Is all traffic affected, or is it just a certain application? Is the packet loss constant, or does it only occur intermittently?

Knowing the answers to the questions above will save you time during the troubleshooting process. The more details you know ahead of time, the more focused your troubleshooting can be.

2. Ruling Out Site-Reconnects

Site-reconnects to the Cato Cloud are a source of packet loss. Check Analytics > Events to see if the packet loss correlates with reconnect events.

3. Bypassing Cato

For packet loss over the Internet, set up a source or destination bypass to quickly rule out an issue with the Cato Cloud. The easiest way to do this is to set up a source bypass for a single user’s IP address in the site configuration and see if the packet loss continues. If the packet loss continues, the problem might be on the LAN, the Socket, or the ISP, but the problem would not be on a PoP.

3._Bypassing_Cato_.png

4. Starting a Continuous Ping

Even if you won’t use this right away, request that a continuous ping be run between a source and destination IP address that is affected by the packet loss. It may take some time to get access to an endpoint on site to start a ping, so it’s best to ask for it early in the troubleshooting process. Pings are easier to trace.

5. Checking Site Analytics

Is packet loss showing on the site’s Analytics packet loss graph? We have different recommendations for Analytics that shows packet loss, and when it doesn’t.

No Packet Loss

    • Socket site: It’s possible for packet loss to exist without any packet loss being displayed in Analytics. There could be an issue on the local network, or it could be a PoP-related issue. Check out the LAN packet loss troubleshooting section to start and if everything seems okay, go on to step 6.

5._No_packet_Loss.png

Packet Loss

5..png

If packet loss is shown on the graph, is it provider or Cato discarded packet loss? Both cases can be caused by a misconfiguration, and you can check the configured bandwidth as outlined in step 6.

For Cato discarded packet loss, you also should also investigate the bandwidth priorities. Check the Priority Analyzer under the site’s Analytics page to see what priority is being dropped. You can expand the priority section to show the top applications in that priority. If packet loss is only affecting a certain application, you may need to raise the priority of that application in the Network Rules. Remember, Cato’s QoS is designed to drop low priority packets when congestion occurs, so Cato discarded packet loss is not always a problem.

Use_as_placeholder_for_now.png

The Priority Analyzer in Analytics shows packet loss in the upstream and downstream direction for each QoS priority.

6. Checking Bandwidth Configuration

Packet loss can be caused by link congestion, and it’s important that the bandwidth for each WAN link is configured correctly in the Cato Management Application. Make sure that the configured bandwidth matches what the ISP provides in the site configuration.

_6_panel.png

If the configured bandwidth is lower than what the ISP provides, Cato’s QoS engine can start dropping packets when the configured bandwidth limit is exceeded. If this is the case, there is a flatline across a site’s Analytics throughput graph equal to the site’s configured bandwidth.

You can see this same behavior if the bandwidth is configured correctly but the ISP link is congested. This behavior does not guarantee a problem, but it in this situation it is a good practice to confirm that the bandwidth is configured correctly.

6._Replace_Downstream__delete_upstream_.png

If the configured bandwidth is higher than what the ISP provides, Cato’s QoS engine does not kick in when the ISP’s bandwidth limit is exceeded, and therefore the ISP may start dropping packets randomly. If this is the case, you see a flatline across the site Analytics throughput graph below the level of the configured bandwidth along with provider packet loss.

7. Checking the Socket’s Link Status

One possible cause of provider loss is that a Socket link is running at half duplex. This means that packets can only travel in one direction (outbound or inbound) at a time which drastically reduces throughput and results in packet loss. All Socket links should always be at full-duplex without exception.

Also make sure that the both WAN and LAN link speeds are equal to or above the bandwidth configured for a site. The link speed can be the limiting factor for throughput. For instance, if a site’s configured bandwidth is 200 Mbps but the LAN link has only negotiated to 100 Mbps full-duplex, a computer connected to the Socket can’t achieve higher than 100 Mbps throughput.

To check the link state, log in to the Socket UI and view the Link status in the Monitor page. The example below shows the WAN1 link at 100 Mbps half-duplex.

7.png

If you notice a link at half-duplex or set to the wrong speed, check the port settings on the device that the Socket’s link is connected to. Make sure it is set to auto-negotiate or that it matches the Socket’s speed settings. All Socket links default to auto-negotiate, but you can force the speed under the Network Settings page.

_7.png

If the port settings are correct on the other device, the ethernet cable could be damaged. Replace the cable with a known good one and see if the duplex or speed changes. If that doesn’t work, connect a laptop computer or other device to the Socket’s port and check the link status. Do the same on the other device. If the Socket’s link comes up at the expected speed and duplex but the other device’s link does not, you’ll know the problem is with the other device.

Was this article helpful?

4 out of 4 found this helpful

Comments

0 comments

Please sign in to leave a comment.