Guide to Cato Data Lake Storage

This article discusses the details of event and Data Lake storage for your Cato account.

Overview

Cato maintains a Data Lake that contains the data recorded by Cato networking and security functions. Data such as Event information is added to the data lake in real time and stored for a period, as defined by the customer’s contract, before being discarded.

Cato stores events and data for up to three months, free of charge, as part of the service. Customers may choose to increase the storage and extend the retention period beyond three months. This requires the purchase of Data Lake storage. Customers may also forward their data to a SIEM, using Cato’s Event Integration APIs, or to AWS or Azure datastores using cloud storage integration.

This article applies to all Cato accounts as of January 1st, 2024(*).

Event Storage Approach

Events are stored in real-time and can be tracked in the Cato Management Application in the Events page (Monitoring > Events).

  • Cato stores a core set of key security and connectivity events for each customer

  • Customers can select, within policies, additional events to be recorded

  • Customer licenses define the maximum number of events that can be stored per hour

  • Events in excess of this number are discarded for the remainder of the hour

Event Storage Measurement and Discard

The primary unit of measurement for data lake storage is the number of events stored per hour.

For each customer, the number of events that were stored in the last hour is tracked by a counter.

  • At the start of each hour, the counter is reset

  • When the number of events reaches a threshold set for the customer, further events are discarded for the remainder of that hour

    However, Cato continues to store system events that are related to Cato's processes

  • Cato generally allows headroom above the threshold, to reduce the likelihood of discard

Event Rate Limiting

The details for the default Cato rate limiting for events are as follows:

  • Cato allows up to 2.5 million events per hour, free of charge

  • If more than 2.5 million are generated in an hour, the events in excess of 2.5 million are discarded

  • Customers have the option to purchase rate limiting for more than 2.5 million events per hour

Customers will generally find that the default event rate limiting is sufficient for their needs unless they choose a best-practice logging of all events.

Event Retention and Storage

For contracts and renewals starting from January 1st, 2024, the default retention period for events is 3 months.

  • After the retention period (ie. after 3 months), event data is discarded

  • Customers may purchase additional data storage if they wish to store event data for more than three months

If a customer chooses to pay for storage, no allowance is made for the free storage that is provided by default: all event storage is chargeable.

  • For more about purchasing additional data storage, please contact your Cato representative.

Cato supports the following event storage options:

Data Lake Storage Units

Data Lake storage is purchased in units of 2.5 million events per hour. So, for example:

  • One unit of Data Lake Storage will allow up to 2.5 million events per hour

  • Two units will allow up to 5 million events per hour

Data Lake storage units define the peak number of events that can be stored per hour. A period when fewer events are stored per hour will have no bearing on the number that can be stored in future hours.

Data Lake storage units are available in three variants, according to the storage duration required:

  • A three-month unit

  • A six-month unit

  • A twelve-month unit

The chosen variant applies to all data units, it is not possible to mix units.

Examples

The table below illustrates the use of Data Storage Units to cover customer event storage requirements.

Peak number of events per hour that the customer wishes to be able to store

Retention period required

Number of Data Storage Units required

Type of Data Storage Unit required

Up to 2.5 million

3 months

None

None

Up to 2.5 million

6 months

1

6-month unit

Up to 5 million

3 months

2

3-month unit

Up to 7.5 million

12 months

3

12-month unit

Estimating Storage Requirements Based on Event History

Customers with a stable history of event storage can inspect the event chart in the Cato Management Application to see how many events are being generated. They can use the peaks in this chart to consider their requirements for storage.

In the example chart below, the peaks reach a maximum of just over 400,000 events per hour. This would be covered by the free storage, if three months’ retention is sufficient.

Data_SKUs_Event_History_1.png

In the example chart below, the number of events per hour exceeds 2 million in every hour, and the highest peak approaches 3 million. This is more than can be covered by bundled storage. A paid storage of 2 units would cover these storage requirements, allowing up to 5 million events per hour to be stored.

Data_SKUs_Event_History_2.png

Note that the exact height of each bar can be inspected by hovering the cursor over the bar, as illustrated in the chart below.

Data_SKUs_Event_History_2_hover.png

Further points to note:

  • These examples cover a small period, for convenience. A longer analysis period would be prudent.

  • The time period represented by each bar will change according to the time period covered by the chart. Pay attention to the Time Series Granularity as you change the time period covered.

Estimating Storage Requirements without an Event History

Event generation is correlated to both the total bandwidth in use across the network and the number of SDP users supported.

Therefore, customers without a history of event generation can estimate their likely storage requirements by considering first, the sum of the bandwidths in use at each site, and second, the number of their SDP users.

Tables are provided below to assist with estimating the peak events generated per hour. Follow this procedure to calculate requirements from the tables:

  1. Find the row in the Total Bandwidth table that corresponds to peak bandwidth bought for the network. Read off the estimated peak events per hour that will be generated

  2. Find the row in the SDP Clients table that corresponds to the number of SDP Clients in use. Read off the estimated peak events per hour that will be generated

  3. Add the two figures

  4. Divide the total events per hour by 2.5 million, and round up, to estimate the number of Data Lake Storage Units required.

Event Generation Tables

Use these tables to estimate the peak number of events per hour generated for a customer. They assume that the customer is logging all events.

Total Bandwidth

Estimated peak events per hour

SDP Clients

Estimated peak events per hour

Up to 2.5Gbps

1,000,000

Up to 3K

1,000,000

2.5-6Gbps

5,000,000

3K-7K

5,000,000

6-9Gbps

7,500,000

7K-11K

7,500,000

9-12Gbps

10,000,000

11K-15K

10,000,000

12-15Gbps

12,500,000

15K-19K

12,500,000

15-18Gbps

15,000,000

19K-23K

15,000,000

18-21Gbps

17,500,000

23K-27K

17,500,000

21-24Gbps

20,000,000

27K-31K

20,000,000

24-27Gbps

22,500,000

31K-35K

22,500,000

27-30Gbps

25,000,000

35K-39K

25,000,000

30-33Gbps

27,500,000

39K-43K

27,500,000

Example Estimation

In the table above:

  • A total of 3 Gbps bandwidth across all sites would generate an estimated peak of five million events per hour

  • A total of 5,000 SDP clients would generate an additional estimated peak of two and a half million events per hour

  • Therefore, the customer could expect a peak of 5+2.5= 7.5 million events per hour

  • This could be covered by buying three Data Lake Storage units of the appropriate duration.

Estimating Actual Storage Required

The unit of measure for Data Lake storage is the number of events stored per hour. The volume of data involved is not used in the calculation or purchase of storage units and it is not reported by the Cato Management Application.

However, customers may wish to estimate the storage implications if they plan to export data to external storage or an SIEM. Customers can make a rough estimate for the volume of data involved, by assuming that one unit of Data Lake storage (2.5 million events per hour) is very roughly equivalent to 180 GB per month, as illustrated in the table below.

Note that this is a very rough estimate. Data Lake Storage Units define the maximum number of events that can be stored in an hour. It is self-evident that a customer who buys storage units to cope with occasional large peaks in event storage will have a very different external storage requirement than a customer who buys the same number of Units to cope with a consistently high number of events stored.

The following table shows a very rough estimate of the total GB according to the retention period:

Events per hour

Storage Units

GB per month (estimated)

3 months

6 months

12 months

2.5 million

1

180

540

1080

2160

5 million

2

360

1080

2160

4320

7.5 million

3

540

2160

4320

8640

(*) Some contracts with Cato may include terms that differ from the information in this article

Was this article helpful?

2 out of 2 found this helpful

0 comments

Add your comment