-1.3 C
New York
Wednesday, February 4, 2026

Create and replace Apache Iceberg tables with partitions within the AWS Glue Knowledge Catalog utilizing the AWS SDK and AWS CloudFormation


Lately, we’ve witnessed a big shift in how enterprises handle and analyze their ever-growing information lakes. On the forefront of this transformation is Apache Iceberg, an open desk format that’s quickly gaining traction amongst large-scale information customers.

Nonetheless, as enterprises scale their information lake implementations, managing these Iceberg tables at scale turns into difficult. Knowledge groups typically have to handle desk schema evolution, its partitioning, and snapshots variations. Automation streamlines these operations, supplies consistency, reduces human error, and helps information groups concentrate on higher-value duties.

The AWS Glue Knowledge Catalog now helps Iceberg desk administration utilizing the AWS Glue API, AWS SDKs, and AWS CloudFormation. Beforehand, customers needed to create Iceberg tables within the Knowledge Catalog with out partitions utilizing CloudFormation or SDKs and later add partitions from Amazon Athena or different analytics engines. This prevents the desk lineage from being tracked in a single place and provides steps exterior automation within the steady integration and supply (CI/CD) pipeline for desk upkeep operations. With the launch, AWS Glue clients can now use their most well-liked automation or infrastructure as code (IaC) instruments to automate Iceberg desk creation with partitions and use the identical instruments to handle schema updates and type order.

On this publish, we present how you can create and replace Iceberg tables with partitions within the Knowledge Catalog utilizing the AWS SDK and CloudFormation.

Answer overview

Within the following sections, we illustrate the AWS SDK for Python (Boto3) and AWS Command Line Interface (AWS CLI) utilization of Knowledge Catalog APIs—CreateTable() and UpdateTable()—for Amazon Easy Storage Service (Amazon S3) based mostly Iceberg tables with partitions. We additionally present the CloudFormation templates to create and replace an Iceberg desk with partitions.

Stipulations

The Knowledge Catalog API adjustments are made obtainable within the following variations of the AWS CLI and SDK for Python:

  • AWS CLI model of two.27.58 or above
  • SDK for Python model of 1.39.12 or above

AWS CLI utilization

Let’s create an Iceberg desk with one partition, utilizing CreateTable() within the AWS CLI:

aws glue create-table --cli-input-json file://createicebergtable.json

The createicebergtable.json is as follows:

{
    "CatalogId": "123456789012",
    "DatabaseName": "bankdata_icebergdb",
    "Title": "transactiontable1",
    "OpenTableFormatInput": { 
      "IcebergInput": { 
         "MetadataOperation": "CREATE",
         "Model": "2",
         "CreateIcebergTableInput": { 
            "Location": "s3://sampledatabucket/bankdataiceberg/transactiontable1/",
            "Schema": {
                "SchemaId": 0,
                "Sort": "struct",
                "Fields": [ 
                    { 
                        "Id": 1,
                        "Name": "transaction_id",
                        "Required": true,
                        "Type": "string"
                    },
                    { 
                        "Id": 2,
                        "Name": "transaction_date",
                        "Required": true,
                        "Type": "date"
                    },
                    { 
                        "Id": 3,
                        "Name": "monthly_balance",
                        "Required": true,
                        "Type": "float"
                    }
                ]
            },
            "PartitionSpec": { 
                "Fields": [ 
                    { 
                        "Name": "by_year",
                        "SourceId": 2,
                        "Transform": "year"
                    }
                ],
                "SpecId": 0
            },
            "WriteOrder": { 
                "Fields": [ 
                    { 
                        "Direction": "asc",
                        "NullOrder": "nulls-last",
                        "SourceId": 1,
                        "Transform": "none"
                    }
                ],
                "OrderId": 1
            }  
        }
      }
   }
}

The previous AWS CLI command creates the metadata folder for the Iceberg desk in Amazon S3, as proven within the following screenshot.

Amazon S3 bucket interface showing metadata folder containing single JSON file dated November 6, 2025

You may populate the desk with values as follows and confirm the desk schema utilizing the Athena console:

SELECT * FROM "bankdata_icebergdb"."transactiontable1" restrict 10;
insert into bankdata_icebergdb.transactiontable1 values
    ('AFTERCREATE1234', DATE '2024-08-23', 6789.99),
    ('AFTERCREATE5678', DATE '2023-10-23', 1234.99);
SELECT * FROM "bankdata_icebergdb"."transactiontable1";

The next screenshot reveals the outcomes.

Amazon Athena query editor showing SQL queries and results for bankdata_icebergdb database with transaction data

After populating the desk with information, you possibly can examine the S3 prefix of the desk, which can now have the information folder.

Amazon S3 bucket interface displaying data folder with two subfolders organized by year: 2023 and 2024

The information folders partitioned in response to our desk definition and Parquet information recordsdata created from our INSERT command can be found below every partitioned prefix.

Amazon S3 bucket interface showing by_year=2023 folder containing single Parquet file of 575 bytes

Subsequent, we replace the Iceberg desk by including a brand new partition, utilizing UpdateTable():

aws glue update-table --cli-input-json file://updateicebergtable.json

The updateicebergtable.json is as follows.

{
  "CatalogId": "123456789012",
  "DatabaseName": "bankdata_icebergdb",
  "Title": "transactiontable1",
  "UpdateOpenTableFormatInput": {
    "UpdateIcebergInput": {
      "UpdateIcebergTableInput": {
        "Updates": [
          {
            "Location": "s3://sampledatabucket/bankdataiceberg/transactiontable1/",
            "Schema": {
              "SchemaId": 1,
              "Type": "struct",
              "Fields": [
                {
                  "Id": 1,
                  "Name": "transaction_id",
                  "Required": true,
                  "Type": "string"
                },
                {
                  "Id": 2,
                  "Name": "transaction_date",
                  "Required": true,
                  "Type": "date"
                },
                {
                  "Id": 3,
                  "Name": "monthly_balance",
                  "Required": true,
                  "Type": "float"
                }
              ]
            },
            "PartitionSpec": {
              "Fields": [
                {
                  "Name": "by_year",
                  "SourceId": 2,
                  "Transform": "year"
                },
                {
                  "Name": "by_transactionid",
                  "SourceId": 1,
                  "Transform": "identity"
                }
              ],
              "SpecId": 1
            },
            "SortOrder": {
              "Fields": [
                {
                  "Direction": "asc",
                  "NullOrder": "nulls-last",
                  "SourceId": 1,
                  "Transform": "none"
                }
              ],
              "OrderId": 2
            }
          }
        ]
      }
    }
  }
}

UpdateTable() modifies the desk schema by including a metadata JSON file to the underlying metadata folder of the desk in Amazon S3.

Amazon S3 bucket interface showing 5 metadata objects including JSON and Avro files with timestamps

We insert values into the desk utilizing Athena as follows:

insert into bankdata_icebergdb.transactiontable1 values
    ('AFTERUPDATE1234', DATE '2025-08-23', 4536.00),
    ('AFTERUPDATE5678', DATE '2022-10-23', 23489.00);
SELECT * FROM "bankdata_icebergdb"."transactiontable1";

The next screenshot reveals the outcomes.

Amazon Athena query editor with SQL statements and results after iceberg partition update and insert data

Examine the corresponding adjustments to the information folder within the Amazon S3 location of the desk.

Amazon S3 prefix showing new partitions for the Iceberg table

This instance has illustrated how you can create and replace Iceberg tables with partitions utilizing AWS CLI instructions.

SDK for Python utilization

The next Python scripts illustrate utilizing CreateTable() and UpdateTable() for an Iceberg desk with partitions:

CloudFormation utilization

Use the next CloudFormation templates for CreateTable() and UpdateTable(). After the CreateTable template is full, replace the identical stack with the UpdateTable template by creating a brand new changeset to your stack and executing it.

Clear up

To keep away from incurring prices on the Iceberg tables created utilizing the AWS CLI, delete the tables from the Knowledge Catalog.

Conclusion

On this publish, we illustrated how you can use the AWS CLI to create and replace Iceberg tables with partitions within the Knowledge Catalog. We additionally offered the SDK for Python and CloudFormation pattern code and templates. We hope this helps you automate the creation and administration of your Iceberg tables with partitions in your CI/CD pipelines and manufacturing environments. Strive it out to your personal use case and share your suggestions within the feedback part.


In regards to the authors

Acknowledgements: A particular due to everybody who contributed to the event and launch of this function – Purvaja Narayanaswamy, Sachet Saurabh, Akhil Yendluri and Mohit Chandak.

Aarthi Srinivasan

Aarthi Srinivasan

Aarthi is a Senior Huge Knowledge Architect with AWS. She works with AWS clients and companions to architect information lake home options, improve product options, and set up greatest practices for information governance.

Pratik Das

Pratik Das

Pratik is a Senior Product Supervisor with AWS. He’s captivated with all issues information and works with clients to know their necessities and construct pleasant experiences. He has a background in constructing data-driven options and machine studying methods in manufacturing.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Stay Connected

0FansLike
0FollowersFollow
0SubscribersSubscribe
- Advertisement -spot_img

Latest Articles