Integrating Cato Data with Splunk

Overview

The Splunk integration enables Cato to forward data directly to Splunk using a native connector and supports two data sources:

  • Events - Generated when specific activity occurs in the network or system, such as when a policy rule is matched, or a threat is detected. These records provide discrete, real-time insights into security and policy enforcement. The data is sent using Cato's event schema.
  • Flows - Originate as network flows (5-tuple) and are enriched with application-level information as it becomes available through the different Cato engines. In addition to application and user context, flows include aggregated session data such as bytes, packets, and duration, providing a complete view of network activity over time. The superset of flow fields is represented by the appStats schema.

    Some fields are available only for flows streamed through the native integration and are not part of appStats or Application Analytics. For example, flow_id and aggregated metrics such as upstream and downstream packets and bytes, and flow duration. These fields are marked with the following comment:

    Only available for native flows data integration created in the CMA.

Note: The raw data field containing the incident information for XOps events (event type detection and response) may be truncated when they are sent to Splunk if their raw_data field (which includes story information) exceeds 5 MB in size (this is the Splunk default, but it can be increased).

Use Cases

Events

A company is using Splunk for centralized security monitoring and response. As Cato customers, they have useful data from key features such as network activity, threats, user data, devices, and all other aspects of traffic traversing the Cato platform. They can use this integration to send this data directly to Splunk, where they can easily integrate it into existing workflows for the SOC and NOC teams.

Flows

A security analyst in Splunk identifies a suspicious event where a user accessed a high-risk application that may be associated with data exfiltration. Using Cato events alone, the analyst can see the policy decision, user identity, and application. However, the event does not show how much data was transferred or how long the session lasted.

With aggregated traffic flow data correlated to the event using the flow_id field, the analyst can view the full session context, including total bytes transferred, packet count, and session duration. This allows the analyst to determine whether the activity involved minimal interaction or a large data transfer that may indicate exfiltration.

By combining events and flow data, the analyst can quickly validate the severity of the incident and take appropriate action.

Prerequisites

Creating the Splunk Integration

After creating an HEC token in Splunk, you define an integration in the CMA. You can use the filters to limit which Event data you want to include in the integration. After the integration is created, data flows to Splunk in the Index you specified.

In the configuration process, you can configure whether to integrate Events, Flows or both. By default only Events is configured. The Flows data source can generate a significantly higher volume of data compared to events. The exact volume depends on your traffic. The CMA supports configuring multiple integrations, allowing you to send different data sources as needed.

The Splunk URL and port are the HEC endpoint to access your account. In general, this is the web URL you use to access Splunk with the characters "http-inputs-" appended to the beginning. For example, if your account is http://mydomain.splunk.com, you would use https://http-inputs-mydomain.splunkcloud.com/. For more details, see the Splunk documentation. The port is optional, and we use 443 if you do not specify anything else (which is the default for Splunk Cloud).

Deleting the integration in the CMA does not remove any resources created in Splunk.

Note:

  • For Splunk Enterprise (self-managed) integrations:

    • The Splunk HEC endpoint must be reachable over the Internet (i.e., exposed via a public IP or public DNS name). Private IPs or internal-only endpoints are not supported.
    • TLS inspection must be enabled, and the endpoint must present a valid X.509 certificate issued by a trusted public Certificate Authority. Self-signed certificates or privately issued CA certificates are not supported, as connections are only validated using standard CA trust chains.

To create the Splunk integration:

  1. In your Splunk account, create a new token to use for this integration. For details, see the Splunk documentation. You can define a custom Index or use the default Index for the token.
  2. Copy the token value that is displayed. You will need this to configure the integration with Cato.
  3. From the navigation menu, click Resources > Integrations.
  4. On the Integrated Apps tab, click New. The New Integration panel opens.
  5. Select Splunk and configure the following fields:

    splunk_integration.png
    1. In the Auth dropdown, select API Key.
    2. A Connector Name and Description( optional) for this integration.
    3. The Ingestion URL and API Key that you created in Splunk.
    4. Specify the Index that will receive the data from Cato. If you leave this blank, we will use the default Index defined on the HEC token.
    5. Whether to integrate Events, Flows, or both.
    6. A filter to limit which Cato events are sent to Splunk.  

      Note: Filters only apply to event data.

    7. Specify if you want to create an event if errors occur with the integration.
  6. Click Save.
  7. In the CMA, after refreshing the Integrations page, you can view the status of the integration in the Integrated Apps tab.

Choosing Between the Native Turnkey and Custom GitHub Integration Methods

In addition to the native turnkey integration described in this article, you can also integrate Cato events with Splunk using the tools in the Cato GitHub account. Each approach offers distinct advantages depending on your goals and environment. You can also use both integrations if needed.

When to Use the Native Integration

Cato’s native integration offers a scalable and supportable solution with minimal configuration. Benefits of the native integration include:

  • The ability to handle large volumes of events efficiently with no API based limitations
  • Fully maintained and supported by Cato

When to Use the GitHub Integration

The GitHub integration provides flexibility for advanced use cases where custom data sources or processing logic are needed. You might want to use this integration in the following situations:

  • You want to send data from Cato's Audit Log to Splunk
  • You want to use our GitHub as an open source resource to customize the integration

Was this article helpful?

0 out of 0 found this helpful

2 comments

  • Comment author
    Akihiko Hashimoto

    The “event schema” link is returning a 404 response, so please correct it.

  • Comment author
    Yaakov Simon

    Akihiko Hashimoto Thanks! Link is fixed