13.6 C
New York
Sunday, May 10, 2026

The best way to consolidate cross-Area S3 knowledge into OpenSearch


You may need knowledge in Amazon Easy Storage Service (Amazon S3) buckets in numerous AWS Areas that you really want obtainable in a single Amazon OpenSearch Service area or assortment. Consolidating knowledge throughout Areas supplies unified analytics and searches, scale back operation complexity, and streamline your search infrastructure. We’re completely satisfied to announce that Amazon OpenSearch Ingestion pipelines can now learn from S3 buckets in numerous Areas to ingest and consolidate knowledge right into a single OpenSearch Service area or assortment.

To consolidate this knowledge throughout AWS Areas, you beforehand had to supply your individual resolution. Now Amazon OpenSearch Ingestion might help you accomplish this. On this publish, I’ll present you how you can use the brand new cross-Area assist to ingest knowledge from S3 buckets throughout a number of AWS Areas right into a single OpenSearch Service area or assortment.

Amazon OpenSearch Ingestion (OSI) is a feature-rich knowledge ingestion pipeline that you should use for a lot of completely different functions: observability, analytics, and zero-ETL search. Many shoppers use OpenSearch Ingestion to ingest knowledge from Amazon S3 into OpenSearch Service domains and Amazon OpenSearch Serverless collections. Till now, you could possibly solely ingest from a single AWS Area at a time. Now that you should use OpenSearch Ingestion for cross-Area S3 ingestion, I’ll present you the way you should use it in two eventualities: batch processing utilizing S3 scan, and streaming ingestion utilizing Amazon Easy Queue Service (Amazon SQS) queues for AWS vended logs like Amazon Digital Non-public Cloud (Amazon VPC) Circulate Logs and AWS CloudTrail.

Conditions

Full the next prerequisite steps:

  1. Deploy an OpenSearch Service area or OpenSearch Serverless assortment within the Areas the place you need to carry out your search or analytics.
  2. You want S3 buckets in not less than two completely different Areas. You should use current ones or create S3 buckets. You should use one in the identical AWS Area as your OpenSearch Service area or assortment, or use two fully completely different Areas.
  3. Add objects with knowledge into your S3 buckets. The info might be JSON, ND-JSON, Parquet, CSV, or plaintext codecs.
  4. Configure AWS Identification and Entry Administration (IAM) permissions wanted for OSI. For directions, see Amazon S3 as a supply.
  5. For cross-Area ingestion, you will need to now additionally embody the s3:GetBucketLocation permission. This offers the pipeline the flexibility to find out which AWS Area the bucket is positioned in.

After you full these steps, you may both arrange your Amazon OpenSearch Ingestion pipelines for batch or streaming eventualities. Within the following sections, I’ll offer you suggestions on when to decide on which method, and I define the steps for creating your pipeline.

Batch eventualities

You should use the OpenSearch Ingestion S3 scan functionality to learn batch knowledge from S3. You may discover this method helpful when your knowledge is written to S3 on a schedule. To carry out a cross-Area S3 scan, you solely specify the buckets that you just’re studying from while you create the OpenSearch Ingestion pipeline.

The next diagram exhibits the design for an OpenSearch Ingestion pipeline in us-west-2 studying from S3 buckets in us-east-1 and eu-west-1 and writing that knowledge into an OpenSearch Service area in us-west-2.

Subsequent, you’ll create an OpenSearch Ingestion pipeline. You will need to create this pipeline in the identical Area as your OpenSearch Service area or assortment.

model: "2"
s3-scan-cross-region:
  supply:
    s3:
      compression: automated
      codec:
        json:
      scan:
        buckets:
          - bucket:
              identify: amzn-s3-demo-bucket1
          - bucket:
              identify: amzn-s3-demo-bucket2
      aws:
        area: us-west-2

  sink:
    - opensearch:
        hosts: [ "https://search-mydomain-abcdefghijklmn.us-west-2.es.amazonaws.com" ]
        index: s3_scan_cross_region
        aws:
          area: us-west-2

The earlier pipeline configuration helps the JSON codec. You may need to configure a special codec in case your knowledge isn’t a big JSON object.

Now you can question your OpenSearch Service area or assortment to see the information that you just ingested.

Streaming eventualities: AWS vended logs

Like a lot of our prospects, you may need to ingest S3 knowledge from completely different AWS Areas into OpenSearch Service. A standard cause is to consolidate AWS vended logs. For instance, VPC Circulate Logs, CloudTrail knowledge, and cargo balancer logs. For these eventualities, you may configure OpenSearch Ingestion pipelines to learn from an Amazon SQS queue to stream knowledge into your OpenSearch Service area or assortment.

These AWS vended logs write to Amazon S3 in the identical AWS Area because the service working it. For instance, VPC Circulate Logs will likely be in the identical AWS Area as your Amazon VPC. You should use OpenSearch Ingestion to consolidate these logs into one AWS Area. Within the VPC Circulate Logs instance, you may consolidate your VPC Circulate Logs from a number of AWS Areas right into a single OpenSearch Service area or assortment to research community patterns out of your completely different Amazon VPCs.

The next diagram outlines the general setup. It exhibits an instance of sending AWS vended logs from us-east-1 and eu-west-1 to an OpenSearch Service area in us-west-2. You’ll be able to change the AWS Areas relying in your particular wants.

  1. You will need to configure your vended logs to jot down log occasions to Amazon S3 buckets of their respective AWS Areas. Utilizing VPC Circulate Logs as our instance, you may configure VPC Circulate Logs in your VPCs.
  2. Create an Amazon SQS queue in the identical AWS Area as your OpenSearch Service area.
  3. Amazon S3 doesn’t ship notifications to cross-Area Amazon SQS queues, so you’ll use intermediate Amazon Easy Notification Service (Amazon SNS) matters to consolidate the notifications from a number of Areas into one queue. For every S3 bucket, create an SNS matter.
  4. Configure S3 Occasion Notifications for SNS. You’ll do that for every S3 bucket and every SNS matter.
  5. SNS can ship cross-Area notifications to SQS. Create a subscription from every SNS matter that you just created in step 3 to the one SQS queue you created in step 2.
  6. Configure your pipeline position to learn from SQS and skim from the related S3 buckets.

Now create an OpenSearch Ingestion pipeline in the identical AWS Area as your OpenSearch Service area.

model: "2"
s3-sqs-cross-region:
  supply:
    s3:
      notification_type: sqs
      codec:
        newline:
      sqs:
        queue_url: https://sqs.us-west-2.amazonaws.com/123456789012/amzn-s3-demo-all-regions
      aws:
        area: us-west-2

  sink:
    - opensearch:
        hosts: [ "https://search-mydomain-abcdefghijklmn.us-west-2.es.amazonaws.com" ]
        index: s3_sqs_cross_region
        aws:
          area: us-west-2

The earlier pipeline configuration helps the JSON codec. You may need to configure a special codec in case your knowledge is just not a big JSON object.

Subsequent, add objects with knowledge into your S3 buckets. By importing knowledge, S3 will ship notifications to SNS after which the SQS queue.

Now you can question your OpenSearch Service area or assortment to see the information that you just ingested.

Here’s what makes this doable and what’s completely different. The SQS queue receives the occasion notifications for the buckets. Earlier than the cross-Area characteristic of OpenSearch Ingestion, the pipeline might see these occasions, however couldn’t entry the S3 bucket even when the permissions have been granted. Now, the pipeline will decide the AWS Area that the bucket is in, entry an AWS Safety Token Service (AWS STS) token for the AWS Area of the bucket. Utilizing the STS token from the identical Area because the S3 bucket permits the pipeline to learn and entry the information.

Utilizing the AWS Console

While you create the pipeline utilizing the OpenSearch Ingestion console, you’ll have choices to pick out a blueprint in your use-case. These blueprints assist you create pipelines for numerous vended log sorts solely by choosing your SQS queue and OpenSearch area. The blueprint handles the information kind mappings for you by together with acceptable processors. You should use these blueprints as a place to begin and modify your processors in your particular necessities.

Clear up assets

While you’re carried out testing this out, use the next assets to delete the assets that you just created.

For those who arrange a batch pipeline:

  • Delete the OpenSearch Ingestion pipeline.

For those who arrange a streaming pipeline:

For each pipelines, these steps assist you delete the frequent assets.

Conclusion

On this publish, I confirmed you the way you should use Amazon OpenSearch Ingestion to ingest knowledge from Amazon S3 buckets in numerous AWS Areas. I confirmed that this works for each batch scan and streaming eventualities. The characteristic affords you a simple solution to consolidate your knowledge from different Areas into one OpenSearch Service area or assortment.

To get began with the cross-Area S3 supply, consult with the OpenSearch Ingestion documentation or attempt making a pipeline from one among our blueprints utilizing the OpenSearch Ingestion console. You’ll be able to learn concerning the codecs that OpenSearch Ingestion affords for parsing your S3 objects. It’s also possible to find out how concerning the numerous processors that OpenSearch Ingestion affords, so you may rework and enrich your knowledge to fulfill your wants.

It’s also possible to use OpenSearch Ingestion for cross-Area and cross-account. To do that, you will need to grant cross-account permissions in your S3 bucket. You will need to additionally make some modifications to your pipeline configuration. Combining what I confirmed you on this publish with the present cross-account options significantly expands your ingestion choices.

For those who’re able to take your streaming ingestion analytics to the subsequent stage you may examine how you can generate metrics from logs and even how you can ship these derived metrics to Amazon Managed Service for Prometheus.

Have you ever tried out the cross-Area capabilities of OpenSearch Ingestion? Share your use-cases and questions within the feedback.


In regards to the authors

David is a senior software program engineer engaged on observability in OpenSearch at Amazon Net Providers. He’s a maintainer on the Knowledge Prepper mission.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Stay Connected

0FansLike
0FollowersFollow
0SubscribersSubscribe
- Advertisement -spot_img

Latest Articles