Cato maintains a Data Lake that contains the data recorded by Cato networking and security functions. Data such as Event information is added to the data lake in real time and stored for a period, as defined by the customer’s contract, before being discarded.
Cato stores events and data for up to three months, free of charge, as part of the service. Customers may choose to increase the storage and extend the retention period beyond three months. This requires the purchase of Data Lake storage. Customers may also forward their data to a SIEM, using Cato’s Event Integration APIs, or to AWS or Azure datastores using cloud storage integration.
This document applies only for accounts licensed with DPA 2023.
Events are stored in real time and can be tracked in the Cato Management Application (CMA) in the Events screen (Monitoring > Events).
- Customer licences define the maximum number of events that can be stored in any hour
Events in excess of this number are discarded for the remainder of the hour
The primary unit of measurement for data lake storage is the number of events stored per hour.
For each customer the number of events that were stored in the last hour is tracked by a counter.
At the start of each hour, the counter is reset
When the number of events reaches a threshold set for the customer, further events are discarded for the remainder of that hour
- Cato sends an email notification and generates an appropriate event to inform when the event threshold is exceeded
- Admins can use the Events page to determine the events processing rate on the data lake, and examine overall trends
- Admins can check the data units license configured on the account in Adminstration > License > General
Cato bundles storage that is designed to be sufficient for most customers:
Cato stores up to 2 million events per hour, for three months, with no additional charge
If more than 2 million are generated in any hour, those in excess of 2 million are discarded
Stored events are retained for 3 months and then discarded
Customers will generally find that bundled storage is sufficient for their needs, unless they choose best-practice logging of all events, or they wish to store for longer than three months.
Customers may purchase additional data storage if they wish to store:
more than 2 million events in any hour
event information for more than three months
If a customer chooses to pay for storage, no allowance is made for the bundled storage that is provided by default.
We support options for forwarding to SIEM or other external storage, using Cato Event Integration APIs.
Data Lake storage is purchased in units of 2 million events per hour. For example:
One unit of Data Lake Storage will allow up to 2 million events to be stored in any hour
Two units will allow up to 4 million events to be stored in any hour
Data Lake storage units define the peak number of events that can be stored per hour. A period when fewer events are stored per hour will have no bearing on the number that can be stored in future hours.
Data Lake storage units are available in three variants, according to the storage duration required:
A three-month unit
A six-month unit
A twelve-month unit
The chosen variant applies to all data units, it is not possible to mix units.
Customers with a stable history of event storage can inspect the event chart in CMA to see how many events are being generated. They can use the peaks in this chart to consider their requirements for storage.
In the example chart below, the peaks reach a maximum of just over 400,000 events per hour. This would be covered by the bundled storage, if three months’ retention is sufficient.
In the example chart below, the number of events per hour exceeds 2 million in every hour, and the highest peak approaches 3 million. This is more than can be covered by bundled storage. A paid storage of 2 units would cover these storage requirements, allowing up to 4 million events per hour to be stored.
Note that the exact height of each bar can be inspected by hovering the cursor over the bar, as illustrated in the chart below.
Further points to note:
These examples cover a small period, for convenience. A longer analysis period would be prudent.
The time period represented by each bar will change according to the time period covered by the chart. Pay attention to the Time Series Granularity as you change the time period covered.
Event generation is correlated to both the total bandwidth in use across the network and the number of SDP users supported.
Therefore, customers without a history of event generation can estimate their likely storage requirements by considering first, the sum of the bandwidths in use at each site, and second, the number of their SDP users.
Tables are provided below to assist with estimating the peak events generated per hour. Follow this procedure to calculate requirements from the tables:
Find the row in the Total Bandwidth table that corresponds to peak bandwidth bought for the network. Read off the estimated peak events per hour that will be generated
Find the row in the SDP Clients table that corresponds to the number of SDP Clients in use. Read off the estimated peak events per hour that will be generated
Add the two figures
Divide the total events per hour by two million, and round up, to estimate the number of Data Lake Storage Units required.
Use these tables to estimate the peak number of events per hour generated for a customer. They assume that the customer is logging all events.
In the table above:
A total of 3 Gbps bandwidth across all sites would generate an estimated peak of four million events per hour
A total of 2,000 SDP clients would generate an additional estimated peak of one million events per hour
Therefore, the customer could expect a peak of 4+1= 5 million events per hour
This could be covered by buying three Data Lake Storage units of the appropriate duration.