-1.1 C
New York
Thursday, February 5, 2026

Amazon Kinesis Information Streams now helps 10x bigger report sizes: Simplifying real-time information processing


In the present day, AWS introduced that Amazon Kinesis Information Streams now helps report sizes as much as 10MiB – a tenfold enhance from the earlier restrict. With this launch, now you can publish intermittent bigger information payloads in your information streams whereas persevering with to make use of current Kinesis Information Streams APIs in your purposes with out extra effort. This launch is accompanied by a 2x enhance within the most PutRecords request dimension from 5MiB to 10MiB, simplifying information pipelines and lowering operational overhead for IoT analytics, change information seize, and generative AI workloads.

On this publish, we discover Amazon Kinesis Information Streams massive report assist, together with key use circumstances, configuration of most report sizes, throttling issues, and finest practices for optimum efficiency.

Actual world use circumstances

As information volumes develop and use circumstances evolve, we’ve seen rising demand for supporting bigger report sizes in streaming workloads. Beforehand, once you wanted to course of data bigger than 1MiB, you had two choices:

  • Break up massive data into a number of smaller data in producer purposes and reassemble them in shopper purposes
  • Retailer massive data in Amazon Easy Storage Service (Amazon S3) and ship solely metadata by way of Kinesis Information Streams

Each these approaches are helpful, however they add complexity to information pipelines, requiring extra code, rising operational overhead, and complicating error dealing with and debugging, notably when prospects have to stream massive data intermittently.

This enhancement improves the benefit of use and reduces operational overhead for purchasers dealing with intermittent information payloads throughout numerous industries and use circumstances. Within the IoT analytics area, related automobiles and industrial gear are producing rising volumes of sensor telemetry information, with the scale of particular person telemetry data sometimes exceeding the earlier 1MiB restrict in Kinesis. This required prospects to implement advanced workarounds, reminiscent of splitting massive data into a number of smaller ones or storing the massive data individually and solely sending metadata by way of Kinesis. Equally, in database change information seize (CDC) pipelines, massive transaction data may be produced, particularly throughout bulk operations or schema adjustments. Within the machine studying and generative AI area, workflows are more and more requiring the ingestion of bigger payloads to assist richer function units and multi-modal information varieties like audio and pictures. The elevated Kinesis report dimension restrict from 1MiB to 10MiB limits the necessity for these kinds of advanced workarounds, simplifying information pipelines and lowering operational overhead for purchasers in IoT, CDC, and superior analytics use circumstances. Clients can now extra simply ingest and course of these intermittent massive information data utilizing the identical acquainted Kinesis APIs.

The way it works

To start out processing bigger data:

  1. Replace your stream’s most report dimension restrict (maxRecordSize) by way of the AWS Console, AWS CLI, or AWS SDKs.
  2. Proceed utilizing the identical PutRecord and PutRecords APIs for producers.
  3. Proceed utilizing the identical GetRecords or SubscribeToShard APIs for shoppers.

Your stream will likely be in Updating standing for just a few seconds earlier than being able to ingest bigger data.

Getting began

To start out processing bigger data with Kinesis Information Streams, you possibly can replace the utmost report dimension by utilizing the AWS Administration Console, CLI or SDK.

On the AWS Administration Console,

  1. Navigate to the Kinesis Information Streams console.
  2. Select your stream and choose the Configuration tab.
  3. Select Edit (subsequent to Most report dimension).
  4. Set your required most report dimension (as much as 10MiB).
  5. Save your adjustments.

Word: This setting solely adjusts the utmost report dimension for this Kinesis information stream. Earlier than rising this restrict, confirm that every one downstream purposes can deal with bigger data.

Commonest shoppers reminiscent of Kinesis Consumer Library (beginning with model 2.x), Amazon Information Firehose supply to Amazon S3 and AWS Lambda assist processing data bigger than 1 MiB. To be taught extra, confer with the Amazon Kinesis Information Streams documentation for big data.

You can even replace this setting utilizing the AWS CLI:

aws kinesis update-max-record-size 
--stream-arn  
--max-record-size-in-ki-b 5000

Or utilizing the AWS SDK:

import boto3

shopper = boto3.shopper('kinesis')
response = shopper.update_max_record_size(
StreamARN='arn:aws:kinesis:us-west-2:123456789012:stream/my-stream',
MaxRecordSizeInKiB=5000
)

Throttling and finest practices for optimum efficiency

Particular person shard throughput limits of 1MiB/s for writes and 2MiB/s for reads stay unchanged with assist for bigger report sizes. To work with massive data, let’s perceive how throttling works. In a stream, every shard has a throughput capability of 1 MiB per second. To accommodate massive data, every shard quickly bursts as much as 10MiB/s, finally averaging out to 1MiB per second. To assist visualize this habits, consider every shard having a capability tank that refills at 1MiB per second. After sending a big report (for instance, a 10MiB report), the tank begins refilling instantly, permitting you to ship smaller data as capability turns into out there. This capability to assist massive data is constantly refilled into the stream. The speed of refilling is dependent upon the scale of the massive data, the scale of the baseline report, the general site visitors sample, and your chosen partition key technique. While you course of massive data, every shard continues to course of baseline site visitors whereas leveraging its burst capability to deal with these bigger payloads.

For example how Kinesis Information Streams handles totally different proportions of huge data, let’s study the outcomes a easy check. For our check configuration, we arrange a producer that sends information to an on-demand stream (defaults to 4 shards) at a charge of fifty data per second. The baseline data are 10KiB in dimension, whereas massive data are 2MiB every. We carried out a number of check circumstances by progressively rising the proportion of huge data from 1% to five% of the full stream site visitors, together with a baseline case containing no massive data. To make sure constant testing circumstances, we distributed the massive data uniformly over time for instance, within the 1% state of affairs, we despatched one massive report for each 100 baseline data. The next graph exhibits the outcomes:

Within the graph, horizontal annotations point out throttling incidence peaks. The baseline state of affairs, represented by the blue line, exhibits minimal throttling occasions. Because the proportion of huge data will increase from 1% to five%, we observe a rise within the charge at which your stream throttles your information, with a notable acceleration in throttling occasions between the two% and 5% situations. This check demonstrates how Kinesis Information Streams manages rising proportion of huge data.

We suggest sustaining massive data at 1-2% of your complete report depend for optimum efficiency. In manufacturing environments, precise stream habits varies based mostly on three key elements: the scale of baseline data, the scale of huge data, and the frequency at which massive data seem within the stream. We suggest that you simply check together with your demand sample to find out the precise habits.

With on-demand streams, when the incoming site visitors exceeds 500 KB/s per shard, it splits the shard inside quarter-hour. The dad or mum shard’s hash key values are redistributed evenly throughout baby shards. Kinesis mechanically scales the stream to extend the variety of shards, enabling distribution of huge data throughout a bigger variety of shards relying on the partition key technique employed.

For optimum efficiency with massive data:

  1. Use a random partition key technique to distribute massive data evenly throughout shards.
  2. Implement backoff and retry logic in producer purposes.
  3. Monitor shard-level metrics to determine potential bottlenecks.

In the event you nonetheless have to constantly stream of huge data, think about using Amazon S3 to retailer payloads and ship solely metadata references to the stream. Check with Processing massive data with Amazon Kinesis Information Streams for extra info.

Conclusion

Amazon Kinesis Information Streams now helps report sizes as much as 10MiB, a tenfold enhance from the earlier 1MiB restrict. This enhancement simplifies information pipelines for IoT analytics, change information seize, and AI/ML workloads by eliminating the necessity for advanced workarounds. You may proceed utilizing current Kinesis Information Streams APIs with out extra code adjustments and profit from elevated flexibility in dealing with intermittent massive payloads.

  • For optimum efficiency, we suggest sustaining massive data at 1-2% of complete report depend.
  • For finest outcomes with massive data, implement a uniformly distributed partition key technique to evenly distribute data throughout shards, embrace backoff and retry logic in producer purposes, and monitor shard-level metrics to determine potential bottlenecks.
  • Earlier than rising the utmost report dimension, confirm that every one downstream purposes and shoppers can deal with bigger data.

We’re excited to see the way you’ll leverage this functionality to construct extra highly effective and environment friendly streaming purposes. To be taught extra, go to the Amazon Kinesis Information Streams documentation.


Concerning the authors

Sumant Nemmani

Sumant Nemmani

Sumant is a product supervisor for Amazon Kinesis Information Streams. He’s obsessed with studying from prospects and enjoys serving to them unlock worth with AWS. Exterior of labor, he spends time making music together with his band Mission Mishram, exploring historical past and meals whereas touring, and long-form podcasts on expertise and historical past.

Umesh Chaudhari

Umesh Chaudhari

Umesh is a Sr. Streaming Options Architect at AWS. He works with prospects to design and construct real-time information processing programs. He has intensive working expertise in software program engineering, together with architecting, designing, and growing information analytics programs. Exterior of labor, he enjoys touring, following tech tendencies

Pratik Patel

Pratik Patel

Pratik is Sr. Technical Account Supervisor and streaming analytics specialist. He works with AWS prospects and supplies ongoing assist and technical steering to assist plan and construct options utilizing finest practices and proactively hold prospects’ AWS environments operationally wholesome.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Stay Connected

0FansLike
0FollowersFollow
0SubscribersSubscribe
- Advertisement -spot_img

Latest Articles