Overview
Socket upgrade failures can occur during various stages, from initial deployment to scheduled maintenance window, and manual upgrades. Understanding and resolving these issues promptly is crucial for maintaining network integrity. Here's an overview of the troubleshooting process for addressing Socket upgrade failures.
Symptoms
- Failed Initial Upgrade: Occurs during Socket deployment.
- Maintenance Window Issues: Large numbers of Sockets were not upgraded during scheduled maintenance.
- Established tunnel after failed upgrade: The Socket upgrade failed, but the tunnel remains up.
- Inaccessibility Post-Upgrade: Sockets become inaccessible after an upgrade.
Possible Causes
- Connectivity Issues: Timeout due to slow internet or improper MTU settings.
- DNS Resolution Failures: Inability to resolve cc2.catonetworks.com.
- Firewall Restrictions: Firewalls with SSL inspection.
- Port Limitations: WAN1/Port1 restrictions.
Troubleshooting Socket Upgrade Failure
Note
Note: Before starting to troubleshoot, make sure to understand how Socket upgrades work at Cato in the following article: Understanding Cato's Managed Socket Upgrade Service
Socket upgrades will take place during the configured maintenance window in CMA or during initial deployment. This section will delve into the steps involved in troubleshooting Socket upgrade failures. There are primarily three possible outcomes for upgrade failures:
- The initial Socket Upgrade fails during Socket Deployment.
- The tunnel remains up and established despite the upgrade failure.
- The tunnel fails to come up and the Socket becomes inaccessible following the upgrade failure.
Initial Upgrade Failure
When a newly deployed or factory-reset Socket first connects to the Internet, it will continuously attempt to reach out to Cato via its WAN port, and it will attempt to upgrade its firmware version.
To troubleshoot Initial Upgrade failures please see Troubleshooting Failed Initial Firmware Upgrade
Tunnel is Established After an Upgrade Failure
During a maintenance window, the Socket upgrade process might not succeed resulting in an upgrade failure that prevents other Sockets in the entire account from being upgraded. It's important to identify the failed upgrades and focus on upgrading them before scheduling a new maintenance window.
Analyzing CMA Events
Review Socket upgrade-related events by filtering the Sub-type as Socket Upgrade and Action as Not Succeeded
Events with action Skipped may indicate that the Socket was offline during the maintenance window or that a different Socket failed to upgrade (No open tunnel after grace time), which led to all the remaining Sockets being skipped. The reason for the skip action can be seen in the Event Message. For example:
- Upgrade was skipped. Primary socket was offline during maintenance window.
- Upgrade was skipped. Skipped pending upgrade for this Socket, because a different Socket couldn't complete the upgrade.
Events with action Failed indicate that the Socket upgrade was attempted but the upgrade process itself failed. The reason for the failed action can be seen in the Event Message
If the Socket becomes inaccessible after this failure, go to Tunnel Fails to Establish after an Upgrade.
Continue the troubleshooting process by focusing on Sockets with action Failed.
Troubleshooting Failures During the Upgrade
During the upgrade process, the Socket will attempt to download the firmware image. Timeouts may occur due to the following reasons:
- Failure to resolve DNS properly for cc2.catonetworks.com
- Slow or unreliable internet connection prevents the firmware download.
- Improper MTU setting on WAN interfaces.
To rule out the above reasons, check the following:
- Use the Ping Tool from the WebUI to confirm that the Socket can resolve cc2.catonetworks.com via the tunnel. If the FQDN is not resolvable, check the DNS settings on the WAN port.
- In Network Analytics, check if the tunnel presented packet loss during the maintenance window. If so, check if there is also Last-Mile packet loss and report this issue to the ISP.
- Cato Sockets run PMTUD (MTU discovery) with the PoP to determine the allowed MTU over the tunnel. However, manually setting the MTU on the WAN interface may lead to packet fragmentation and performance degradation. Check the configured MTU value in the WebUI.
Troubleshooting Failures After the Upgrade
Once the firmware has been downloaded and installed on the Socket, the Socket will enter a grace period (10 minutes) where several checks are run to determine that the newly installed version is stable:
- The socket process is running.
- Ping works to cc2.catonetworks.com, 8.8.8.8, and Facebook over the internet
- The connection to the PoP is established for at least 5 minutes.
- There were at least ten successful syncs between the Socket and the PoP.
- cURL works to cc2.catonetworks.com via the tunnel.
If the checks aren't successful during the grace period, the Socket will roll back to the previous version, assuming that the new version is unstable. Ensure that the Socket keeps its internet access for 10 minutes after the upgrade is completed.
Performing a Socket Reboot
In some Fatal upgrade failures, rebooting the Socket may be helpful before re-trying the firmware upgrade. If the tunnel is still up after the upgrade failure, a remote Socket reboot can be done via WebUI under the Administration tab.
If the Socket is inaccessible after the upgrade failure, go to Tunnel Fails to Establish after an Upgrade.
Manual Socket Upgrade and Rescheduling
Sockets with action Skipped during the maintenance window can be manually upgraded from CMA once the Socket is back online. Sockets with action Failed must follow the above troubleshooting steps before attempting to upgrade them manually. For information about manually upgrading in CMA see CMA Manual Upgrade.
For large accounts, CMA manual upgrades may take a long time to complete. Instead of manually upgrading each Socket, it may be only necessary to troubleshoot and upgrade the Socket that failed (action Failed) during the first maintenance window and then schedule a new maintenance window. For information about re-scheduling a maintenance window in CMA see Rescheduling the Upgrade Process.
If the upgrade process continues to fail with the same or other Sockets, submit a Support ticket with the results of the above troubleshooting.
Tunnel Fails to Establish after an Upgrade
Analyzing CMA Events
Socket upgrade events with Action Failed and event message No open tunnel after grace time indicate that the Socket was reported offline after the Socket Upgrade period ended (17 minutes).
On-site personnel will have to be on-site and follow the steps explained in Resolving Inaccessible Socket after an Upgrade.
Resolving Discovered Issues
CMA Manual Upgrade
An upgrade failure may have been caused by a momentary connectivity issue and could succeed the second time around. To attempt a new Socket upgrade, manually initiate the upgrade from Site Configuration > Socket > Actions > Upgrade. See Manually Upgrading a Socket
It is recommended to select the latest available firmware version with the upgrade mechanism being "Cato Cloud Initiated". 17 minutes after the manual firmware upgrade starts, CMA will show an "upgraded successfully" notification indicating that the Socket reported a successful upgrade after the grace period.
Rescheduling the Upgrade Process
Once the previously failed Socket has been upgraded manually or with the help of Support, it's possible to schedule a new maintenance window to upgrade the remaining Sockets by changing the Socket Maintenance Window date/time in CMA. See Configuring the Socket Upgrade Maintenance Window
This action will trigger a CMA notification "Sockets version upgrade is available" with the number of Sockets that will be upgraded in the new maintenance window. Make sure to schedule the new maintenance window at least 48 hours after the moment you're scheduling it. If there are less than 48 hours before the site maintenance window, the site waits until the following week to initiate the Socket upgrade.
Resolving Inaccessible Socket after an Upgrade
On-site personnel will have to follow the following steps:
-
Collect Console Logs. Connect a console cable to the Socket. Go to Device Manager > Ports, and note the COM port of the console cable. Open Putty or a similar terminal application and use the below parameters.
Save the console output in a text file for future investigation.- On physical Sockets, this step must be done before rebooting the Socket as Socket logs get lost after reboot.
- For Azure vSockets, console logs can be obtained from Azure under the VM > Help > Boot diagnostics > Serial log > Download serial log. These logs are collected for up to 6 boots.
- Reboot. The next step is to reboot if the tunnel fails to establish or the Socket becomes inaccessible after an upgrade.
- Unassign and Re-Assign Socket to Site. If the reboot doesn't help bring up the tunnel/Socket, unassign the Socket in CMA. If the Socket is detected, it will appear in the CMA notification after a few minutes. Assign the Socket back to the same site.
- Flash the Socket. If there's no CMA notification, the next step is to flash the Socket to its factory default state. You can either press and hold the F/D button for 30-35 seconds or perform a USB reset to do that.
- Contact Support. Submit the collected console logs to Support and request to initiate an RMA process for the Socket. We recommend initiating this process if all the above steps have been performed and failed.
Raising cases to Cato Support
Submit a Support ticket with the results of the above troubleshooting steps. Please include the following information in the ticket:
- Details of the affected Sockets and overall impact.
- Related CMA events and notifications showing the Socket upgrade failure.
- Results of manual upgrades and maintenance window rescheduling.
- Collected console logs if the Socket becomes inaccessible.
0 comments
Please sign in to leave a comment.