Guide to Cato Data Lake Storage

This article discusses the details of event and Data Lake storage for your Cato account.

Overview

Cato maintains a Data Lake that contains the data recorded by Cato networking and security functions. Data such as Event information is added to the data lake in real time and stored for a period, as defined by the customer’s contract, before being discarded.

Cato stores events and data for up to three months, free of charge, as part of the service. Customers may choose to increase the storage and extend the retention period beyond three months. This requires the purchase of Data Lake storage. Customers may also forward their data to a SIEM, using Cato’s Event Integration APIs, or to AWS or Azure datastores using cloud storage integration.

This article applies to all Cato account as of January 1st, 2024(*).

Event Storage Approach

Events are stored in real time and can be tracked in the Cato Management Application in the Events screen (Monitoring > Events).

  • Cato stores a core set of key security and connectivity events for each customer

  • Customers can select, within policies, additional events to be recorded

  • Customer licenses define the maximum number of events that can be stored per hour

  • Events in excess of this number are discarded for the remainder of the hour

Event Storage Measurement and Discard

The primary unit of measurement for data lake storage is the number of events stored per hour.

For each customer the number of events that were stored in the last hour is tracked by a counter.

  • At the start of each hour, the counter is reset

  • When the number of events reaches a threshold set for the customer, further events are discarded for the remainder of that hour
    However, Cato continues to store system events that are related to Cato processes​

  • Cato generally allows headroom above the threshold, to reduce the likelihood of discard

Event Rate Limiting

The details for the default Cato rate limiting for events is as follows:

  • Cato allows up to 2 million events per hour, free of charge

  • If more than 2 million are generated in any hour, the events in excess of 2 million are discarded

  • Customers have the option to purchase rate limiting for more than 2 million events per hour

Customers will generally find that the default event rate limiting is sufficient for their needs, unless they choose a best-practice logging of all events.

Event Retention and Storage

For contracts and renewals starting from January 1st, 2024, the default retention period for events is 3 months.

  • After the retention period (ie. after 3 months), event data is discarded

  • Customers may purchase additional data storage if they wish to store event data for more than three months

If a customer chooses to pay for storage, no allowance is made for the free storage that is provided by default: all event storage is chargeable.

  • For more about purchasing additional data storage, please contact your Cato representative.

Cato supports the following event storage options:

Data Lake Storage Units

Data Lake storage is purchased in units of 2 million events per hour. So, for example:

  • One unit of Data Lake Storage will allow up to 2 million events per hour

  • Two units will allow up to 4 million events per hour

Data Lake storage units define the peak number of events that can be stored per hour. A period when fewer events are stored per hour will have no bearing on the number that can be stored in future hours.

Data Lake storage units are available in three variants, according to the storage duration required:

  • A three-month unit

  • A six-month unit

  • A twelve-month unit

The chosen variant applies to all data units, it is not possible to mix units.

Examples

The table below illustrates the use of Data Storage Units to cover customer event storage requirements.

Data_SKUs_examples.png

Estimating Storage Requirements Based on Event History

Customers with a stable history of event storage can inspect the event chart in Cato Management Application to see how many events are being generated. They can use the peaks in this chart to consider their requirements for storage.

In the example chart below, the peaks reach a maximum of just over 400,000 events per hour. This would be covered by the free storage, if three months’ retention is sufficient.

Data_SKUs_Event_History_1.png

In the example chart below, the number of events per hour exceeds 2 million in every hour, and the highest peak approaches 3 million. This is more than can be covered by bundled storage. A paid storage of 2 units would cover these storage requirements, allowing up to 4 million events per hour to be stored.

Data_SKUs_Event_History_2.png

Note that the exact height of each bar can be inspected by hovering the cursor over the bar, as illustrated in the chart below.

Data_SKUs_Event_History_2_hover.png

Further points to note:

  • These examples cover a small period, for convenience. A longer analysis period would be prudent.

  • The time period represented by each bar will change according to the time period covered by the chart. Pay attention to the Time Series Granularity as you change the time period covered.

Estimating Storage Requirements without an Event History

Event generation is correlated to both the total bandwidth in use across the network and the number of SDP users supported.

Therefore, customers without a history of event generation can estimate their likely storage requirements by considering first, the sum of the bandwidths in use at each site, and second, the number of their SDP users.

Tables are provided below to assist with estimating the peak events generated per hour. Follow this procedure to calculate requirements from the tables:

  1. Find the row in the Total Bandwidth table that corresponds to peak bandwidth bought for the network. Read off the estimated peak events per hour that will be generated

  2. Find the row in the SDP Clients table that corresponds to the number of SDP Clients in use. Read off the estimated peak events per hour that will be generated

  3. Add the two figures

  4. Divide the total events per hour by two million, and round up, to estimate the number of Data Lake Storage Units required.

Event Generation Tables

Use these tables to estimate the peak number of events per hour generated for a customer. They assume that the customer is logging all events.

Data_SKUs_Event_Generation_Tables.png

Example Estimation

In the table above:

  • A total of 3 Gbps bandwidth across all sites would generate an estimated peak of four million events per hour

  • A total of 2,000 SDP clients would generate an additional estimated peak of one million events per hour

  • Therefore, the customer could expect a peak of 4+1= 5 million events per hour

  • This could be covered by buying three Data Lake Storage units of the appropriate duration.

Estimating Actual Storage Required

The unit of measure for Data Lake storage is the number of events stored per hour. The volume of data involved is not used in the calculation or purchase of storage units and it is not reported by Cato Management Application.

However, customers may wish to estimate the storage implications if they plan to export data to external storage or a SIEM. Customers can make a rough estimate for the volume of data involved, by assuming that one unit of Data Lake storage (2 million events per hour) is very roughly equivalent to 150 GB per month, as illustrated in the table below.

Note that this is a very rough estimate. Data Lake Storage Units define the maximum number of events that can be stored in any hour. It is self-evident that a customer who buys storage units to cope with occasional large peaks in event storage will have a very different external storage requirement than a customer who buys the same number of Units to cope with a consistently high number of events stored.

Data_SKUs_Actual_Storage.png

 

(*) Some contacts with Cato may include terms that differ from the information in this article

Was this article helpful?

2 out of 2 found this helpful

0 comments

Add your comment