New: Enhance Apache Iceberg question efficiency in Amazon S3 with kind and z-order compaction

June 29, 2025

38

Now you can use kind and z-order compaction to enhance Apache Iceberg question efficiency in Amazon S3 Tables and basic function S3 buckets.

You sometimes use Iceberg to handle large-scale analytical datasets in Amazon Easy Storage Service (Amazon S3) with AWS Glue Information Catalog or with S3 Tables. Iceberg tables help use circumstances equivalent to concurrent streaming and batch ingestion, schema evolution, and time journey. When working with high-ingest or incessantly up to date datasets, information lakes can accumulate many small information that affect the associated fee and efficiency of your queries. You’ve shared that optimizing Iceberg information structure is operationally complicated and sometimes requires growing and sustaining customized pipelines. Though the default binpack technique with managed compaction gives notable efficiency enhancements, introducing kind and z-order compaction choices for each S3 and S3 Tables delivers even better beneficial properties for queries filtering throughout a number of dimensions.

Two new compaction methods: Type and z-order
To assist set up your information extra effectively, Amazon S3 now helps two new compaction methods: kind and z-order, along with the default binpack compaction. These superior methods can be found for each totally managed S3 Tables and Iceberg tables basically function S3 buckets by AWS Glue Information Catalog optimizations.

Type compaction organizes information based mostly on a user-defined column order. When your tables have an outlined kind order, S3 Tables compaction will now use it to cluster comparable values collectively through the compaction course of. This improves the effectivity of question execution by lowering the variety of information scanned. For instance, in case your desk is organized by kind compaction alongside state and zip_code, queries that filter on these columns will scan fewer information, bettering latency and lowering question engine value.

Z-order compaction goes a step additional by enabling environment friendly file pruning throughout a number of dimensions. It interleaves the binary illustration of values from a number of columns right into a single scalar that may be sorted, making this technique notably helpful for spatial or multidimensional queries. For instance, in case your workloads embrace queries that concurrently filter by pickup_location, dropoff_location, and fare_amount, z-order compaction can scale back the overall variety of information scanned in comparison with conventional sort-based layouts.

S3 Tables use your Iceberg desk metadata to find out the present kind order. If a desk has an outlined kind order, no extra configuration is required to activate kind compaction—it’s robotically utilized throughout ongoing upkeep. To make use of z-order, it’s good to replace the desk upkeep configuration utilizing the S3 Tables API and set the technique to z-order. For Iceberg tables basically function S3 buckets, you’ll be able to configure AWS Glue Information Catalog to make use of kind or z-order compaction throughout optimization by updating the compaction settings.

Solely new information written after enabling kind or z-order can be affected. Present compacted information will stay unchanged until you explicitly rewrite them by rising the goal file dimension in desk upkeep settings or rewriting information utilizing customary Iceberg instruments. This conduct is designed to offer you management over when and the way a lot information is reorganized, balancing value and efficiency.

Let’s see it in motion
I’ll stroll you thru a simplified instance utilizing Apache Spark and the AWS Command Line Interface (AWS CLI). I’ve a Spark cluster put in and an S3 desk bucket. I’ve a desk named testtable in a testnamespace. I quickly disabled compaction, the time for me so as to add information into the desk.

After including information, I examine the file construction of the desk.

spark.sql("""
  SELECT 
    substring_index(file_path, '/', -1) as file_name,
    record_count,
    file_size_in_bytes,
    CAST(UNHEX(hex(lower_bounds[2])) AS STRING) as lower_bound_name,
    CAST(UNHEX(hex(upper_bounds[2])) AS STRING) as upper_bound_name
  FROM ice_catalog.testnamespace.testtable.information
  ORDER BY file_name
""").present(20, false)

+--------------------------------------------------------------+------------+------------------+----------------+----------------+
|file_name                                                     |record_count|file_size_in_bytes|lower_bound_name|upper_bound_name|
+--------------------------------------------------------------+------------+------------------+----------------+----------------+
|00000-0-66a9c843-5a5c-407f-8da4-4da91c7f6ae2-0-00001.parquet  |1           |837               |Quinn           |Quinn           |
|00000-1-b7fa2021-7f75-4aaf-9a24-9bdbb5dc08c9-0-00001.parquet  |1           |824               |Tom             |Tom             |
|00000-10-00a96923-a8f4-41ba-a683-576490518561-0-00001.parquet |1           |838               |Ilene           |Ilene           |
|00000-104-2db9509d-245c-44d6-9055-8e97d4e44b01-0-00001.parquet|1000000     |4031668           |Anjali          |Tom             |
|00000-11-27f76097-28b2-42bc-b746-4359df83d8a1-0-00001.parquet |1           |838               |Henry           |Henry           |
|00000-114-6ff661ca-ba93-4238-8eab-7c5259c9ca08-0-00001.parquet|1000000     |4031788           |Anjali          |Tom             |
|00000-12-fd6798c0-9b5b-424f-af70-11775bf2a452-0-00001.parquet |1           |852               |Georgie         |Georgie         |
|00000-124-76090ac6-ae6b-4f4e-9284-b8a09f849360-0-00001.parquet|1000000     |4031740           |Anjali          |Tom             |
|00000-13-cb0dd5d0-4e28-47f5-9cc3-b8d2a71f5292-0-00001.parquet |1           |845               |Olivia          |Olivia          |
|00000-134-bf6ea649-7a0b-4833-8448-60faa5ebfdcd-0-00001.parquet|1000000     |4031718           |Anjali          |Tom             |
|00000-14-c7a02039-fc93-42e3-87b4-2dd5676d5b09-0-00001.parquet |1           |838               |Sarah           |Sarah           |
|00000-144-9b6d00c0-d4cf-4835-8286-ebfe2401e47a-0-00001.parquet|1000000     |4031663           |Anjali          |Tom             |
|00000-15-8138298d-923b-44f7-9bd6-90d9c0e9e4ed-0-00001.parquet |1           |831               |Brad            |Brad            |
|00000-155-9dea2d4f-fc98-418d-a504-6226eb0a5135-0-00001.parquet|1000000     |4031676           |Anjali          |Tom             |
|00000-16-ed37cf2d-4306-4036-98de-727c1fe4e0f9-0-00001.parquet |1           |830               |Brad            |Brad            |
|00000-166-b67929dc-f9c1-4579-b955-0d6ef6c604b2-0-00001.parquet|1000000     |4031729           |Anjali          |Tom             |
|00000-17-1011820e-ee25-4f7a-bd73-2843fb1c3150-0-00001.parquet |1           |830               |Noah            |Noah            |
|00000-177-14a9db71-56bb-4325-93b6-737136f5118d-0-00001.parquet|1000000     |4031778           |Anjali          |Tom             |
|00000-18-89cbb849-876a-441a-9ab0-8535b05cd222-0-00001.parquet |1           |838               |David           |David           |
|00000-188-6dc3dcca-ddc0-405e-aa0f-7de8637f993b-0-00001.parquet|1000000     |4031727           |Anjali          |Tom             |
+--------------------------------------------------------------+------------+------------------+----------------+----------------+
solely displaying high 20 rows

I observe the desk is made from a number of small information and that the higher and decrease bounds for the brand new information have overlap–the info is definitely unsorted.

I set the desk kind order.

spark.sql("ALTER TABLE ice_catalog.testnamespace.testtable WRITE ORDERED BY identify ASC")

I allow desk compaction (it’s enabled by default; I disabled it in the beginning of this demo)

aws s3tables put-table-maintenance-configuration --table-bucket-arn ${S3TABLE_BUCKET_ARN} --namespace testnamespace --name testtable --type icebergCompaction --value "standing=enabled,settings={icebergCompaction={technique=kind}}"

Then, I look forward to the following compaction job to set off. These run all through the day, when there are sufficient small information. I can examine the compaction standing with the next command.

aws s3tables get-table-maintenance-job-status --table-bucket-arn ${S3TABLE_BUCKET_ARN} --namespace testnamespace --name testtable

When the compaction is completed, I examine the information that make up my desk yet one more time. I see that the info was compacted to 2 information, and the higher and decrease bounds present that the info was sorted throughout these two information.

spark.sql("""
  SELECT 
    substring_index(file_path, '/', -1) as file_name,
    record_count,
    file_size_in_bytes,
    CAST(UNHEX(hex(lower_bounds[2])) AS STRING) as lower_bound_name,
    CAST(UNHEX(hex(upper_bounds[2])) AS STRING) as upper_bound_name
  FROM ice_catalog.testnamespace.testtable.information
  ORDER BY file_name
""").present(20, false)

+------------------------------------------------------------+------------+------------------+----------------+----------------+
|file_name                                                   |record_count|file_size_in_bytes|lower_bound_name|upper_bound_name|
+------------------------------------------------------------+------------+------------------+----------------+----------------+
|00000-4-51c7a4a8-194b-45c5-a815-a8c0e16e2115-0-00001.parquet|13195713    |50034921          |Anjali          |Kelly           |
|00001-5-51c7a4a8-194b-45c5-a815-a8c0e16e2115-0-00001.parquet|10804307    |40964156          |Liza            |Tom             |
+------------------------------------------------------------+------------+------------------+----------------+----------------+

There are fewer information, they’ve bigger sizes, and there’s a higher clustering throughout the desired kind column.

To make use of z-order, I observe the identical steps, however I set technique=z-order within the upkeep configuration.

Regional availability
Type and z-order compaction are actually accessible in all AWS Areas the place Amazon S3 Tables are supported and for basic function S3 buckets the place optimization with AWS Glue Information Catalog is out there. There isn’t any extra cost for S3 Tables past current utilization and upkeep charges. For Information Catalog optimizations, compute fees apply throughout compaction.

With these modifications, queries that filter on the kind or z-order columns profit from quicker scan instances and lowered engine prices. In my expertise, relying on my information structure and question patterns, I noticed efficiency enhancements of threefold or extra when switching from binpack to kind or z-order. Inform us how a lot your beneficial properties are in your precise information.

To be taught extra, go to the Amazon S3 Tables product web page or overview the S3 Tables upkeep documentation. You too can begin testing the brand new methods by yourself tables as we speak utilizing the S3 Tables API or AWS Glue optimizations.

— seb

Previous articleNo, you may’t shoot drones out of the sky!

Next articleAI Makes Employees Extra Productive, PwC Finds

New: Enhance Apache Iceberg question efficiency in Amazon S3 with kind and z-order compaction

Related Articles

AirPods Max 2 shock and disappoint, plus OpenClaw! [Cult of Mac podcast No. 12]

Greatest practices for Amazon Redshift Lambda Consumer-Outlined Capabilities

OpenAI’s desktop superapp: The top of ChatGPT as we all know it?

LEAVE A REPLY Cancel reply

Latest Articles

AirPods Max 2 shock and disappoint, plus OpenClaw! [Cult of Mac podcast No. 12]

Greatest practices for Amazon Redshift Lambda Consumer-Outlined Capabilities

OpenAI’s desktop superapp: The top of ChatGPT as we all know it?

EDL 003: Elevating Drone Life: Interview with Izzy and the significance of expertise and constructing confidence

AT&T leads business collaboration with Cisco and NVIDIA to ship network-driven Edge AI for enterprises

About Us

New: Enhance Apache Iceberg question efficiency in Amazon S3 with kind and z-order compaction

Related Articles

LEAVE A REPLY Cancel reply

Stay Connected

Latest Articles

About Us