Fashionable purposes more and more require advanced and long-running coordination between providers, similar to multi-step fee processing, AI agent orchestration, or approval processes awaiting human selections. Constructing these historically required important effort to implement state administration, deal with failures, and combine a number of infrastructure providers.
Beginning immediately, you need to use AWS Lambda sturdy capabilities to construct dependable multi-step purposes immediately throughout the acquainted AWS Lambda expertise. Sturdy capabilities are common Lambda capabilities with the identical occasion handler and integrations you already know. You write sequential code in your most well-liked programming language, and sturdy capabilities monitor progress, robotically retry on failures, and droop execution for as much as one yr at outlined factors,Ā with out paying for idle compute throughout waits.
AWS Lambda sturdy capabilities use a checkpoint and replay mechanism, generally known as sturdy execution, to ship these capabilities. After enabling a perform for sturdy execution, you add the brand new open supply sturdy execution SDK to your perform code. You then use SDK primitives like āstepsā so as to add automated checkpointing and retries to what you are promoting logic and āwaitsā to effectively droop execution with out compute costs. When execution terminates unexpectedly, Lambda resumes from the final checkpoint, replaying your occasion handler from the start whereas skipping accomplished operations.
Getting began with AWS Lambda sturdy capabilities
Let me stroll you thru how you can use sturdy capabilities.
First, I create a brand new Lambda perform within the console and choose Creator from scratch. Within the Sturdy execution part, I choose Allow. Word that, sturdy perform setting can solely be set throughout perform creation and at present canāt be modified for present Lambda capabilities.

After I create my Lambda sturdy perform, I can get began with the supplied code.

Lambda sturdy capabilities introduces two core primitives that deal with state administration and restoration:
- StepsāThe
context.step()technique provides automated retries and checkpointing to what you are promoting logic. After a step is accomplished, it is going to be skipped throughout replay. - WaitāThe
context.wait()technique pauses execution for a specified length, terminating the perform, suspending and resuming execution with out compute costs.
Moreover, Lambda sturdy capabilities offers different operations for extra advanced patterns: create_callback() creates a callback that you need to use to await outcomes for exterior occasions like API responses or human approvals, wait_for_condition() pauses till a particular situation is met like polling a REST API for course of completion, and parallel() or map() operations for superior concurrency use circumstances.
Constructing a production-ready order processing workflow
Now letās increase the default instance to construct a production-ready order processing workflow. This demonstrates how you can use callbacks for exterior approvals, deal with errors correctly, and configure retry methods. I maintain the code deliberately concise to give attention to these core ideas. In a full implementation, you could possibly improve the validation step with Amazon Bedrock so as to add AI-powered order evaluation.
Right hereās how the order processing workflow works:
- First,
validate_order()checks order knowledge to make sure all required fields are current. - Subsequent,
send_for_approval()sends the order for exterior human approval and waits for a callback response, suspending execution with out compute costs. - Then,
process_order()completes order processing. - All through the workflow, try-catch error dealing with distinguishes between terminal errors that cease execution instantly and recoverable errors inside steps that set off automated retries.
Right hereās the whole order processing workflow with step definitions and the primary handler:
import random
from aws_durable_execution_sdk_python import (
DurableContext,
StepContext,
durable_execution,
durable_step,
)
from aws_durable_execution_sdk_python.config import (
Period,
StepConfig,
CallbackConfig,
)
from aws_durable_execution_sdk_python.retries import (
RetryStrategyConfig,
create_retry_strategy,
)
@durable_step
def validate_order(step_context: StepContext, order_id: str) -> dict:
"""Validates order knowledge utilizing AI."""
step_context.logger.data(f"Validating order: {order_id}")
# In manufacturing: calls Amazon Bedrock to validate order completeness and accuracy
return {"order_id": order_id, "standing": "validated"}
@durable_step
def send_for_approval(step_context: StepContext, callback_id: str, order_id: str) -> dict:
"""Sends order for approval utilizing the supplied callback token."""
step_context.logger.data(f"Sending order {order_id} for approval with callback_id: {callback_id}")
# In manufacturing: ship callback_id to exterior approval system
# The exterior system will name Lambda SendDurableExecutionCallbackSuccess or
# SendDurableExecutionCallbackFailure APIs with this callback_id when approval is full
return {
"order_id": order_id,
"callback_id": callback_id,
"standing": "sent_for_approval"
}
@durable_step
def process_order(step_context: StepContext, order_id: str) -> dict:
"""Processes the order with retry logic for transient failures."""
step_context.logger.data(f"Processing order: {order_id}")
# Simulate flaky API that typically fails
if random.random() > 0.4:
step_context.logger.data("Processing failed, will retry")
increase Exception("Processing failed")
return {
"order_id": order_id,
"standing": "processed",
"timestamp": "2025-11-27T10:00:00Z",
}
@durable_execution
def lambda_handler(occasion: dict, context: DurableContext) -> dict:
strive:
order_id = occasion.get("order_id")
# Step 1: Validate the order
validated = context.step(validate_order(order_id))
if validated["status"] != "validated":
increase Exception("Validation failed") # Terminal error - stops execution
context.logger.data(f"Order validated: {validated}")
# Step 2: Create callback
callback = context.create_callback(
title="awaiting-approval",
config=CallbackConfig(timeout=Period.from_minutes(3))
)
context.logger.data(f"Created callback with id: {callback.callback_id}")
# Step 3: Ship for approval with the callback_id
approval_request = context.step(send_for_approval(callback.callback_id, order_id))
context.logger.data(f"Approval request despatched: {approval_request}")
# Step 4: Look forward to the callback outcome
# This blocks till exterior system calls SendDurableExecutionCallbackSuccess or SendDurableExecutionCallbackFailure
approval_result = callback.outcome()
context.logger.data(f"Approval acquired: {approval_result}")
# Step 5: Course of the order with customized retry technique
retry_config = RetryStrategyConfig(max_attempts=3, backoff_rate=2.0)
processed = context.step(
process_order(order_id),
config=StepConfig(retry_strategy=create_retry_strategy(retry_config)),
)
if processed["status"] != "processed":
increase Exception("Processing failed") # Terminal error
context.logger.data(f"Order efficiently processed: {processed}")
return processed
besides Exception as error:
context.logger.error(f"Error processing order: {error}")
increase error # Re-raise to fail the execution
This code demonstrates a number of necessary ideas:
- Error dealing withāThe try-catch block handles terminal errors. When an unhandled exception is thrown outdoors of a step (just like the validation examine), it terminates the execution instantly. That is helpful when thereās no level in retrying, similar to invalid order knowledge.
- Step retriesāContained in the
process_orderstep, exceptions set off automated retries based mostly on the default (step 1) or configuredRetryStrategy(step 5). This handles transient failures like momentary API unavailability. - LoggingāI exploit
context.loggerfor the primary handler andstep_context.loggerinside steps. The context logger suppresses duplicate logs throughout replay.
Now I create a take a look at occasion with order_id and invoke the perform asynchronously to start out the order workflow. I navigate to the Take a look at tab and fill within the non-obligatory Sturdy execution title to determine this execution. Word that, sturdy capabilities offers built-in idempotency. If I invoke the perform twice with the identical execution title, the second invocation returns the prevailing execution outcome as a substitute of making a replica.

I can monitor the execution by navigating to the Sturdy executions tab within the Lambda console:

Right here I can see every stepās standing and timing. The execution exhibits CallbackStarted adopted by InvocationCompleted, which signifies the perform has terminated and execution is suspended to keep away from idle costs whereas ready for the approval callback.

I can now full the callback immediately from the console by selecting Ship success or Ship failure, or programmatically utilizing the Lambda API.

I select Ship success.

After the callback completes, the execution resumes and processes the order. If the process_order step fails as a result of simulated flaky API, it robotically retries based mostly on the configured technique. As soon as all retries succeed, the execution completes efficiently.

Monitoring executions with Amazon EventBridge
You too can monitor sturdy perform executions utilizing Amazon EventBridge. Lambda robotically sends execution standing change occasions to the default occasion bus, permitting you to construct downstream workflows, ship notifications, or combine with different AWS providers.
To obtain these occasions, create an EventBridge rule on the default occasion bus with this sample:
{
"supply": ["aws.lambda"],
"detail-type": ["Durable Execution Status Change"]
}
Issues to know
Listed here are key factors to notice:
- AvailabilityāLambda sturdy capabilities are actually out there in US East (Ohio) AWS Area. For the most recent Area availability, go to the AWS Capabilities by Area web page.
- Programming language assistāAt launch, AWS Lambda sturdy capabilities helps JavaScript/TypeScript (Node.js 22/24) and Python (3.13/3.14). We suggest bundling the sturdy execution SDK together with your perform code utilizing your most well-liked package deal supervisor. The SDKs are fast-moving, so you’ll be able to simply replace dependencies as new options change into out there.
- Utilizing Lambda variationsāWhen deploying sturdy capabilities to manufacturing, use Lambda variations to make sure replay all the time occurs on the identical code model. For those who replace your perform code whereas an execution is suspended, replay will use the model that began the execution, stopping inconsistencies from code adjustments throughout long-running workflows.
- Testing your sturdy capabilitiesāYou may take a look at sturdy capabilities regionally with out AWS credentials utilizing the separate testing SDK with pytest integration and the AWS Serverless Utility Mannequin (AWS SAM) command line interface (CLI) for extra advanced integration testing.
- Open supply SDKsāThe sturdy execution SDKs are open supply for JavaScript/TypeScript and Python. You may assessment the supply code, contribute enhancements, and keep up to date with the most recent options.
- PricingāTo be taught extra on AWS Lambda sturdy capabilities pricing, check with the AWS Lambda pricing web page.
Get began with AWS Lambda sturdy capabilities by visiting the AWS Lambda console. To be taught extra, check with AWS Lambda sturdy capabilities documentation web page.
Blissful constructing!
ā Donnie

