Guide to Cato Data Lake Storage


Cato maintains a Data Lake that contains the data recorded by Cato networking and security functions. Data such as Event information is added to the data lake in real time and stored for a period, as defined by the customer’s contract, before being discarded.

Cato stores events and data for up to three months, free of charge, as part of the service. Customers may choose to increase the storage and extend the retention period beyond three months. This requires the purchase of Data Lake storage. Customers may also forward their data to a SIEM, using Cato’s Event Integration APIs, or to AWS or Azure datastores using cloud storage integration.

This document applies only for accounts licensed with DPA 2023.

Event Storage Approach

Events are stored in real time and can be tracked in the Cato Management Application (CMA) in the Events screen (Monitoring > Events).

  • Customer licences define the maximum number of events that can be stored in any hour
  • Events in excess of this number are discarded for the remainder of the hour

Event Storage Measurement and Discard

The primary unit of measurement for data lake storage is the number of events stored per hour.

For each customer the number of events that were stored in the last hour is tracked by a counter.

  • At the start of each hour, the counter is reset

  • When the number of events reaches a threshold set for the customer, further events are discarded for the remainder of that hour

  • Cato sends an email notification and generates an appropriate event to inform when the event threshold is exceeded
  • Admins can use the Events page to determine the events processing rate on the data lake, and examine overall trends
  • Admins can check the data units license configured on the account in Adminstration > License > General

Data SKUs license page.png

Bundled Storage

Cato bundles storage that is designed to be sufficient for most customers:

  • Cato stores up to 2 million events per hour, for three months, with no additional charge

  • If more than 2 million are generated in any hour, those in excess of 2 million are discarded

  • Stored events are retained for 3 months and then discarded

Customers will generally find that bundled storage is sufficient for their needs, unless they choose best-practice logging of all events, or they wish to store for longer than three months.

Additional Storage

Customers may purchase additional data storage if they wish to store:

  • more than 2 million events in any hour

  • event information for more than three months

If a customer chooses to pay for storage, no allowance is made for the bundled storage that is provided by default.

We support options for forwarding to SIEM or other external storage, using Cato Event Integration APIs.

Data Lake Storage Units

Data Lake storage is purchased in units of 2 million events per hour. For example:

  • One unit of Data Lake Storage will allow up to 2 million events to be stored in any hour

  • Two units will allow up to 4 million events to be stored in any hour

Data Lake storage units define the peak number of events that can be stored per hour. A period when fewer events are stored per hour will have no bearing on the number that can be stored in future hours.

Data Lake storage units are available in three variants, according to the storage duration required:

  • A three-month unit

  • A six-month unit

  • A twelve-month unit

The chosen variant applies to all data units, it is not possible to mix units.


The table below illustrates the use of Data Storage Units to cover customer event storage requirements.


Estimating Storage Requirements Based on Event History

Customers with a stable history of event storage can inspect the event chart in CMA to see how many events are being generated. They can use the peaks in this chart to consider their requirements for storage.

In the example chart below, the peaks reach a maximum of just over 400,000 events per hour. This would be covered by the bundled storage, if three months’ retention is sufficient.


In the example chart below, the number of events per hour exceeds 2 million in every hour, and the highest peak approaches 3 million. This is more than can be covered by bundled storage. A paid storage of 2 units would cover these storage requirements, allowing up to 4 million events per hour to be stored.


Note that the exact height of each bar can be inspected by hovering the cursor over the bar, as illustrated in the chart below.


Further points to note:

  • These examples cover a small period, for convenience. A longer analysis period would be prudent.

  • The time period represented by each bar will change according to the time period covered by the chart. Pay attention to the Time Series Granularity as you change the time period covered.

Estimating Storage Requirements without an Event History

Event generation is correlated to both the total bandwidth in use across the network and the number of SDP users supported.

Therefore, customers without a history of event generation can estimate their likely storage requirements by considering first, the sum of the bandwidths in use at each site, and second, the number of their SDP users.

Tables are provided below to assist with estimating the peak events generated per hour. Follow this procedure to calculate requirements from the tables:

  1. Find the row in the Total Bandwidth table that corresponds to peak bandwidth bought for the network. Read off the estimated peak events per hour that will be generated

  2. Find the row in the SDP Clients table that corresponds to the number of SDP Clients in use. Read off the estimated peak events per hour that will be generated

  3. Add the two figures

  4. Divide the total events per hour by two million, and round up, to estimate the number of Data Lake Storage Units required.

Event Generation Tables

Use these tables to estimate the peak number of events per hour generated for a customer. They assume that the customer is logging all events.


Example Estimation

In the table above:

  • A total of 3 Gbps bandwidth across all sites would generate an estimated peak of four million events per hour

  • A total of 2,000 SDP clients would generate an additional estimated peak of one million events per hour

  • Therefore, the customer could expect a peak of 4+1= 5 million events per hour

  • This could be covered by buying three Data Lake Storage units of the appropriate duration.

Was this article helpful?


Add your comment