Final up to date: December 17, 2025
Initially revealed: December 18, 2017
Amazon Information Firehose helps Splunk Enterprise and Splunk Cloud as a supply vacation spot. This native integration between Splunk Enterprise, Splunk Cloud, and Amazon Information Firehose is designed to make AWS information ingestion setup seamless, whereas providing a safe and fault-tolerant supply mechanism. We wish to allow prospects to observe and analyze machine information from any supply and use it to ship operational intelligence and optimize IT, safety, and enterprise efficiency.
With Amazon Information Firehose, prospects can use a totally managed, dependable, and scalable information streaming answer to Splunk. On this publish, we let you know a bit extra concerning the Amazon Information Firehose and Splunk integration. We additionally present you find out how to ingest giant quantities of information into Splunk utilizing Amazon Information Firehose.
Push vs. Pull information ingestion
Presently, prospects use a mixture of two ingestion patterns, based on information supply and quantity, along with present firm infrastructure and experience:
- Pull-based method: Utilizing devoted pollers operating the favored Splunk Add-on for AWS to drag information from varied AWS providers corresponding to Amazon CloudWatch or Amazon S3.
- Push-based method: Streaming information straight from AWS to Splunk HTTP Occasion Collector (HEC) through the use of Amazon Information Firehose. Examples of relevant information sources embrace CloudWatch Logs and Amazon Kinesis Information Streams.
The pull-based method presents information supply ensures corresponding to retries and checkpointing out of the field. Nonetheless, it requires extra ops to handle and orchestrate the devoted pollers, that are generally operating on Amazon EC2 cases. With this setup, you pay for the infrastructure even when it’s idle.
Alternatively, the push-based method presents a low-latency scalable information pipeline made up of serverless assets like Amazon Information Firehose sending on to Splunk indexers (through the use of Splunk HEC). This method interprets into decrease operational complexity and price. Nonetheless, for those who want assured information supply then you must design your answer to deal with points corresponding to a Splunk connection failure or Lambda execution failure. To take action, you would possibly use, for instance, AWS Lambda Lifeless Letter Queues.
How about getting the most effective of each worlds?
Let’s go over the brand new integration’s end-to-end answer and look at how Amazon Information Firehose and Splunk collectively increase the push-based method right into a native AWS answer for relevant information sources.

By utilizing a managed service like Amazon Information Firehose for information ingestion into Splunk, we offer out-of-the-box reliability and scalability. One of many ache factors of the previous method was the overhead of managing the information assortment nodes (Splunk heavy forwarders). With the brand new Amazon Information Firehose to Splunk integration, there are not any forwarders to handle or arrange. Information producers (1) are configured by the AWS Administration Console to drop information into Amazon Information Firehose.
You can even create your personal information producers. For instance, you possibly can drop information right into a Firehose supply stream through the use of Amazon Kinesis Agent, or through the use of the Firehose API (PutRecord(), PutRecordBatch()), or by writing to a Kinesis Information Stream configured to be the information supply of a Firehose supply stream. For extra particulars, check with Sending Information to an Amazon Information Firehose Supply Stream.
You would possibly want to rework the information earlier than it goes into Splunk for evaluation. For instance, you would possibly wish to enrich it or filter or anonymize delicate information. You are able to do so utilizing AWS Lambda and enabling information transformation in Amazon Information Firehose. On this situation, Amazon Information Firehose is used to decompress the Amazon CloudWatch logs by enabling the function.
Methods fail on a regular basis. Let’s see how this integration handles exterior failures to ensure information sturdiness. In instances when Amazon Information Firehose can’t ship information to the Splunk Cluster, information is mechanically backed as much as an S3 bucket. You may configure this function whereas creating the Firehose supply stream (2). You may select to again up all information or solely the information that’s failed throughout supply to Splunk.
Along with utilizing S3 for information backup, this Firehose integration with Splunk helps Splunk Indexer Acknowledgments to ensure occasion supply. This function is configured on Splunk’s HTTP Occasion Collector (HEC) (3). It ensures that HEC returns an acknowledgment to Amazon Information Firehose solely after information has been listed and is on the market within the Splunk cluster (4).
Now let’s have a look at a hands-on train that reveals find out how to ahead VPC move logs to Splunk.
How-to information
To course of VPC move logs, we implement the next structure.

Amazon Digital Non-public Cloud (Amazon VPC) delivers move log recordsdata into an Amazon CloudWatch Logs group. Utilizing a CloudWatch Logs subscription filter, we arrange real-time supply of CloudWatch Logs to an Amazon Information Firehose stream.
Information coming from CloudWatch Logs is compressed with gzip compression. To work with this compression, we are going to allow decompression for the Firehose stream. Firehose then delivers the uncooked logs to the Splunk Http Occasion Collector (HEC).
If supply to the Splunk HEC fails, Firehose deposits the logs into an Amazon S3 bucket. You may then ingest the occasions from S3 utilizing an alternate mechanism corresponding to a Lambda perform.
When information reaches Splunk (Enterprise or Cloud), Splunk parsing configurations (packaged within the Splunk Add-on for Amazon Information Firehose) extract and parse all fields. They make information prepared for querying and visualization utilizing Splunk Enterprise and Splunk Cloud.
Walkthrough
Set up the Splunk Add-on for Amazon Information Firehose
The Splunk Add-on for Amazon Information Firehose permits Splunk (be it Splunk Enterprise, Splunk App for AWS, or Splunk Enterprise Safety) to make use of information ingested from Amazon Information Firehose. Set up the Add-on on all of the indexers with an HTTP Occasion Collector (HEC). The Add-on is on the market for obtain from Splunkbase. For troubleshooting help, please check with: AWS Information Firehose troubleshooting documentation & Splunk’s official troubleshooting information
HTTP Occasion Collector (HEC)
Earlier than you should use Amazon Information Firehose to ship information to Splunk, arrange the Splunk HEC to obtain the information. From Splunk internet, go to the Setting menu, select Information Inputs, and select HTTP Occasion Collector. Select International Settings, guarantee All tokens is enabled, after which select Save. Then select New Token to create a brand new HEC endpoint and token. While you create a brand new token, make it possible for Allow indexer acknowledgment is checked.

When prompted to pick a supply sort, choose aws:cloudwatchlogs:vpcflow

Create an S3 backsplash bucket
To supply for conditions by which Amazon Information Firehose can’t ship information to the Splunk Cluster, we use an S3 bucket to again up the information. You may configure this function to again up all information or solely the information that’s failed throughout supply to Splunk.
Be aware: Bucket names are distinctive.
Create an Amazon Information Firehose supply stream
On the AWS console, open the Amazon Information Firehose console, and select Create Firehose Stream.
Choose DirectPUT because the supply and Splunk because the vacation spot.

In case you are utilizing Firehose to ship CloudWatch Logs and wish to ship decompressed information to your Firehose stream vacation spot, use Firehose Information Format Conversion (Parquet, ORC) or Dynamic partitioning. You should allow decompression on your Firehose stream, try Ship decompressed Amazon CloudWatch Logs to Amazon S3 and Splunk utilizing Amazon Information Firehose

Enter your Splunk HTTP Occasion Collector (HEC) info in vacation spot settings

Be aware: Amazon Information Firehose requires the Splunk HTTP Occasion Collector (HEC) endpoint to be terminated with a sound CA-signed certificates matching the DNS hostname used to hook up with your HEC endpoint. You obtain supply errors in case you are utilizing a self-signed certificates.
On this instance, we solely again up logs that fail throughout supply.

To observe your Firehose supply stream, allow error logging. Doing this implies that you would be able to monitor file supply errors. Create an IAM function for the Firehose stream by selecting Create new, or Select present IAM function.

You now get an opportunity to assessment and modify the Firehose stream settings. When you’re happy, select Create Firehose Stream.
Create a VPC Circulation Log
To ship occasions from Amazon VPC, you want to arrange a VPC move log. If you have already got a VPC move log you wish to use, you possibly can skip to the “Publish CloudWatch to Amazon Information Firehose” part.
On the AWS console, open the Amazon VPC service. Then select VPC, and select the VPC you wish to ship move logs from. Select Circulation Logs, after which select Create Circulation Log. In case you don’t have an IAM function that permits your VPC to publish logs to CloudWatch, select Create and use a brand new service function.

As soon as lively, your VPC move log ought to appear like the next.

Publish CloudWatch to Amazon Information Firehose
While you generate site visitors to or out of your VPC, the log group is created in Amazon CloudWatch. We create an IAM function to permit Cloudwatch to publish logs to the Amazon Information Firehose Stream.
To permit CloudWatch to publish to your Firehose stream, you want to give it permissions.
Right here is the content material for TrustPolicyForCWLToFireHose.json.
Connect the coverage to the newly created function.
Right here is the content material for PermissionPolicyForCWLToFireHose.json.
The brand new log group has no subscription filter, so arrange a subscription filter. Setting this up establishes a real-time information feed from the log group to your Firehose supply stream. Choose the VPC move log and select Actions. Then select Subscription filters adopted by Create Amazon Information Firehose subscription filter.


While you run the AWS CLI command previous, you don’t get any acknowledgment. To validate that your CloudWatch Log Group is subscribed to your Firehose stream, examine the CloudWatch console.
As quickly because the subscription filter is created, the real-time log information from the log group goes into your Firehose supply stream. Your stream then delivers it to your Splunk Enterprise or Splunk Cloud atmosphere for querying and visualization. The screenshot following is from Splunk Enterprise.

As well as, you possibly can monitor and look at metrics related together with your supply stream utilizing the AWS console.

Conclusion
Though our walkthrough makes use of VPC Circulation Logs, the sample can be utilized in lots of different eventualities. These embrace ingesting information from AWS IoT, different CloudWatch logs and occasions, Kinesis Streams or different information sources utilizing the Kinesis Agent or Kinesis Producer Library. You could use a Lambda blueprint or disable file transformation completely relying in your use case. For a further use case utilizing Amazon Information Firehose, try That is My Structure Video, which discusses find out how to securely centralize cross-account information analytics utilizing Kinesis and Splunk.
In case you discovered this publish helpful, you’ll want to try Integrating Splunk with Amazon Kinesis Streams.
In regards to the Authors
