Efficiency points in analytics environments typically stay invisible till they disrupt dashboards, delay ETL jobs, or influence enterprise selections. For groups operating Amazon Redshift Serverless, unmonitored question queues, long-running queries, or surprising spikes in compute capability can degrade efficiency and improve prices if left undetected.
Amazon Redshift Serverless streamlines operating analytics at scale by eradicating the necessity to provision or handle infrastructure. Nevertheless, even in a serverless setting, sustaining visibility into efficiency and utilization is crucial for environment friendly operation and predictable prices. Whereas Amazon Redshift Serverless gives superior built-in dashboards for monitoring efficiency metrics, delivering notifications on to platforms like Slack, brings one other stage of agility. Actual-time alerts within the staff’s workflow allow quicker response occasions and extra knowledgeable decision-making with out requiring fixed dashboard monitoring.
On this submit, we present you construct a serverless, low-cost monitoring answer for Amazon Redshift Serverless that proactively detects efficiency anomalies and sends actionable alerts on to your chosen Slack channels. This method helps your analytics staff establish and tackle points early, typically earlier than your customers discover an issue.
The answer introduced on this submit makes use of AWS providers to gather key efficiency metrics from Amazon Redshift Serverless, consider them in opposition to thresholds that you may flexibly configure, and notify you when anomalies are detected.
The workflow operates as follows:
- Scheduled execution – An Amazon EventBridge rule triggers an AWS Lambda operate on a configurable schedule (by default, each quarter-hour throughout enterprise hours).
- Metric assortment – The AWS Lambda operate gathers metrics together with queued queries, operating queries, compute capability (RPUs), information storage utilization, desk depend, database connections, and slow-running queries utilizing Amazon CloudWatch and the Amazon Redshift Information API.
- Threshold analysis – Collected metrics are in contrast in opposition to your predefined thresholds that replicate acceptable efficiency and utilization limits.
- Alerting – When a threshold is exceeded, the Lambda operate publishes a notification to an Amazon SNS matter.
- Slack notification – Amazon Q Developer in Chat purposes (previously AWS Chatbot) delivers the alert to your designated Slack channel.
- Observability – Lambda execution logs are saved in Amazon CloudWatch Logs for troubleshooting and auditing.
This structure is absolutely serverless and requires no modifications to your current Amazon Redshift Serverless workloads. To simplify deployment, we offer an AWS CloudFormation template that provisions all required assets.
Stipulations
Earlier than deploying this answer, you need to acquire details about your current Amazon Redshift Serverless workgroup and namespace that you just need to monitor. To establish your Amazon Redshift Serverless assets:
- Open the Amazon Redshift console.
- Within the navigation pane, select Serverless dashboard.
- Observe down your workgroup and namespace names. You’ll use these values when launching this weblog’s AWS CloudFormation template.
Deploy the answer
You possibly can launch the CloudFormation stack and deploy the answer through the supplied hyperlink.
When launching the CloudFormation stack, full the next steps within the AWS CloudFormation Console:
- For Stack identify, enter a descriptive identify resembling redshift-serverless-monitoring.
- Assessment and modify the parameters as wanted on your setting.
- Acknowledge that AWS CloudFormation might create IAM assets with customized names.
- Select Submit.
CloudFormation parameters
Amazon Redshift Serverless Workgroup configuration
Present particulars on your current Amazon Redshift Serverless setting. These values join the monitoring answer to your Redshift setting. Some parameters include the default values that you may change along with your precise configuration.
| Parameter | Default worth | Description |
| Amazon Redshift Workgroup Title | Your Amazon Redshift Serverless workgroup identify. | |
| Amazon Redshift Namespace Title | Your Amazon Redshift Serverless namespace identify. | |
| Amazon Redshift Workgroup ID | Workgroup ID (UUID) of the Amazon Redshift Serverless workgroup to observe. Should comply with the UUID format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx (lowercase hexadecimal with hyphens). |
|
Namespace ID (UUID) of the Amazon Redshift Serverless namespace. Should comply with the UUID format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx (lowercase hexadecimal with hyphens). |
||
| Database Title | dev |
Goal Amazon Redshift database for SQL-based diagnostic and monitoring queries. |
Monitoring schedule
The default schedule runs diagnostic SQL queries each quarter-hour throughout enterprise hours, balancing responsiveness and value effectivity. Working extra often would possibly improve prices, whereas much less frequent monitoring may delay detection of efficiency points. You possibly can modify this schedule to your precise want.
| Parameter | Default worth | Description |
| Schedule Expression | cron(0/15 8-17 ? * MON-FRI *) | EventBridge schedule expression for Lambda operate execution. Default runs each quarter-hour, Monday by Friday, 8 AM to five PM UTC. |
Threshold configuration
Thresholds ought to be tuned primarily based in your workload traits.
| Parameter | Default worth | Description |
| Queries Queued Threshold | 20 | Alerts threshold for queued queries. |
| Queries Working Threshold | 20 | Alerts threshold for operating queries. |
| Compute Capability Threshold (RPUs) | 64 | Alert threshold for compute capability (RPUs). |
| Information Storage Threshold (MB) | 5242880 | Threshold for information storage in MB (default 5 TB). |
| Desk Rely Threshold (MB) | 1000 | Alerts threshold for whole desk depend. |
| Database Connections Threshold | 50 | Alert threshold for database connections. |
| Sluggish Question Threshold (seconds) | 10 | Thresholds in seconds for sluggish question detection. |
| Question Timeout (Seconds) | 30 | Timeout for SQL diagnostics queries. |
Tip:Â Begin with conservative thresholds and refine them after observing baseline habits for one to 2 weeks.
Lambda configuration
Configure the AWS Lambda operate settings. The chosen default values are applicable for many monitoring situations. You might need to change them solely in case of troubleshooting.
| Parameter | Default worth | Description |
| Lambda Reminiscence Dimension (MB) | 256 | Lambda operate reminiscence measurement in MB. |
| Lambda Time Out (Seconds) | 240 | Lambda operate timeout in seconds. |
Safety Configuration – Amazon Digital Personal Cloud (VPC)
In case your group has community isolation necessities, you may optionally allow VPC deployment for the Lambda operate. When enabled, the Lambda operate runs inside your specified VPC subnets, offering community isolation and permitting entry to VPC-only assets.
| Parameter | Default worth | Description |
| VPC ID | VPC ID for Lambda deployment (required if EnableVPC is true). The Lambda operate will probably be deployed on this VPC. Make sure that the VPC has applicable routing (NAT Gateway or VPC Endpoints) to permit Lambda to entry AWS providers like CloudWatch, Amazon Redshift, and Amazon SNS. |
|
| VPC Subnet IDs | Comma-separated listing of subnet IDs for Lambda deployment (required if EnableVPC is true). |
|
| Safety Group IDs | Comma-separated listing of safety group IDs for Lambda (optionally available). If not supplied and EnableVPC is true, a default safety group will probably be created with outbound HTTPS entry. Customized safety teams should enable outbound HTTPS (port 443) to AWS service endpoints. |
Observe that VPC deployment would possibly improve chilly begin occasions and requires an NAT Gateway or VPC endpoints for AWS service entry. We suggest provisioning interface VPC endpoints (by AWS PrivateLink) for the 5 providers the Lambda operate calls which retains all visitors personal with out the recurring value of a NAT Gateway.
Safety configuration – Encryption
In case your group requires encryption of knowledge at relaxation, you may optionally allow AWS Key Administration Service (AWS KMS) encryption for the Lambda operate’s setting variables, CloudWatch Logs, and SNS matter. When enabled, the template encrypts every useful resource utilizing the AWS KMS keys that you just present, both a single shared key for all three providers, or particular person keys for granular key administration and audit separation.
| Parameter | Default worth | Description |
| Shared KMS Key ARN | AWS KMS key ARN to make use of for all encryption (Lambda, Logs, and SNS) except service-specific keys are supplied. This streamlines key administration through the use of a single key for all providers. The important thing coverage should grant encrypt/decrypt permissions to Lambda, CloudWatch Logs, and SNS. | |
| Lambda KMS Key ARN | AWS KMS key ARN for Lambda setting variable encryption (optionally available, overrides SharedKMSKeyArn). Use this for separate key administration per service. The important thing coverage should grant decrypt permissions to the Lambda execution position. If not supplied, SharedKMSKeyArn will probably be used when EnableKMSEncryption is true. |
|
| CloudWatch Logs KMS Key ARN | AWS KMS key ARN for CloudWatch Logs encryption (optionally available, overrides SharedKMSKeyArn). Use this for separate key administration per service. The important thing coverage should grant encrypt/decrypt permissions to the CloudWatch Logs service. If not supplied, SharedKMSKeyArn will probably be used when EnableKMSEncryption is true. |
|
| SNS Subject KMS Key ARN | AWS KMS key ARN for SNS matter encryption (optionally available, overrides SharedKMSKeyArn). Use this for separate key administration per service. The important thing coverage should grant encrypt/decrypt permissions to SNS service and the Lambda execution position. If not supplied, SharedKMSKeyArn will probably be used when EnableKMSEncryption is true. |
|
| Allow Useless Letter Queue | False | Optionally allow Useless Letter Queue (DLQ) for failed Lambda invocations to enhance reliability and safety monitoring. When enabled, occasions that fail in any case retry makes an attempt will probably be despatched to an SQS queue for investigation and potential replay. This helps forestall information loss, gives visibility into failures, and allows safety audit trails for monitoring anomalies. The DLQ retains messages for 14 days. |
Observe that AWS KMS encryption requires the important thing coverage to grant applicable permissions to every consuming service (Lambda, CloudWatch Logs, and SNS).
- On the overview web page, choose I acknowledge that AWS CloudFormation would possibly create IAM assets with customized names.
- Select Submit.
Sources created
The CloudFormation stack creates the next assets:
- EventBridge rule for scheduled execution
- AWS Lambda operate (Python 3.12 runtime)
- Amazon SNS matter for alerts
- IAM position with permissions for CloudWatch, Amazon Redshift Information API, and SNS
- CloudWatch Log Group for Lambda logs
Observe: CloudFormation deployment usually takes 10–quarter-hour to finish. You possibly can monitor progress in actual time beneath the Occasions tab of your CloudFormation stack.
Put up-deployment configuration
After the CloudFormation stack has been efficiently created, full the next steps.
Step 1: File CloudFormation outputs
- Navigate to the AWS CloudFormation console.
- Choose your stack and select the Outputs tab.
- Observe the values for LambdaRoleArn and SNSTopicArn. You will want these in subsequent steps.
Step 2: Grant Amazon Redshift permissions
Grant permissions to the Lambda operate to question Amazon Redshift system tables for monitoring information. Full the next steps to grant the mandatory entry:
- Navigate to the Amazon Redshift console.
- Within the left navigation pane, select Question Editor V2.
- Hook up with your Amazon Redshift Serverless workgroup.
- Execute the next SQL instructions, changing
with the LambdaRoleArn worth out of your CloudFormation outputs:
These instructions create an AmazonRedshift person related to the Lambda IAM position and grant it the sys:monitor Amazon Redshift position. This position gives read-only entry to catalog and system tables with out granting permissions to person information tables.
Step 3: Configure Slack notifications
Amazon Q Developer in chat purposes gives native AWS integration and managed authentication, eradicating customized webhook code and decreasing setup complexity. To obtain alerts in Slack, configure Amazon Q Developer in Chat Functions to attach your SNS matter to your most popular Slack channel:
- Navigate to Amazon Q Developer in chat purposes (previously AWS Chatbot) within the AWS console.
- Observe the directions within the Slack integration documentation to authorize AWS entry to your Slack workspace.
- When configuring the Slack channel, be sure that you choose the proper AWS Area the place you deployed the CloudFormation stack.
- Within the Notifications part, choose the SNS matter created by your CloudFormation stack (check with the SNSTopicArn output worth).
- Preserve the default IAM read-only permissions for the channel configuration.
After configured, alerts mechanically seem in Slack each time thresholds are exceeded.
Price issues
With the default configuration, this answer incurs minimal ongoing prices. The Lambda operate executes roughly 693 occasions per 30 days (each quarter-hour throughout an 8-hour enterprise day, Monday by Friday), leading to a month-to-month value of roughly $0.33 USD. This contains Lambda compute prices ($0.26) and CloudWatch GetMetricData API calls ($0.07). All different providers (EventBridge, SNS, CloudWatch Logs, and Amazon Redshift Information API). The Amazon Redshift Information API has no further expenses past the minimal Amazon Redshift Serverless RPU consumption for the Amazon Redshift Serverless system desk question execution. You possibly can cut back prices by lowering the monitoring frequency (resembling, each half-hour) or improve responsiveness by operating extra often (resembling, each 5 minutes) with a proportional value improve.
All prices are estimates and will differ primarily based in your setting. Variations typically happen as a result of queries scanning system tables might take longer or require further assets relying on the system complexity
Safety greatest practices
This answer implements the next safety controls:
- IAM insurance policies scoped to particular useful resource ARNs for the Amazon Redshift workgroup, namespace, SNS matter, and log group.
- Information API assertion entry restricted to the Lambda operate’s personal IAM person ID.
- Learn-only
sys:monitordatabase position for operational metadata entry. Restrict to the position created by the CloudFormation template. - Reserved concurrent executions capped at 5.
To additional strengthen your safety posture, take into account the next enhancements:
- Allow
EnableKMSEncryptionto encrypt setting variables, logs, and SNS messages at relaxation. - Allow
EnableVPCto deploy the operate inside a VPC for community isolation. - Audit entry by AWS CloudTrail.
Vital: That is pattern code for non-production utilization. Work along with your safety and authorized groups to satisfy your organizational safety, regulatory, and compliance necessities earlier than deployment. This answer demonstrates monitoring capabilities however requires further safety hardening for manufacturing environments, together with encryption configuration, IAM coverage scoping, VPC deployment, and complete testing.
Clear up
To take away all assets and keep away from ongoing expenses if you happen to don’t need to use the answer anymore:
- Delete the CloudFormation stack.
- Take away the Slack integration from Amazon Q Developer in chat purposes.
Troubleshooting
- If no metrics or incomplete SQL diagnostics are returned, confirm that the Amazon Redshift Serverless workgroup is lively with latest question exercise, and make sure the database person has the
sys:monitorposition (GRANT ROLE sys:monitor TO) within the question editor. With out this position, queries execute efficiently however solely return information seen to that person’s permissions reasonably than the total cluster exercise. - For VPC-deployed features that fail to achieve AWS providers, affirm that VPC endpoints or a NAT Gateway are configured for CloudWatch, Amazon Redshift Information API, Amazon Redshift Serverless, SNS, and CloudWatch Logs.
- If the Lambda operate occasions out, improve the
LambdaTimeoutandQueryTimeoutSecondsparameters. The default timeout of 240 seconds accommodates most workloads, however clusters with many lively queries might require further time for SQL diagnostics to finish.
Conclusion
On this submit, we confirmed how one can construct a proactive monitoring answer for Amazon Redshift Serverless utilizing AWS Lambda, Amazon CloudWatch, and Amazon SNS with Slack integration. By mechanically accumulating metrics, evaluating thresholds, and delivering alerts in close to actual time to Slack or your most popular collaborative platform, this answer helps detect efficiency and value points early. As a result of the answer itself is serverless, it aligns with the operational simplicity targets of Amazon Redshift Serverless—scaling mechanically, requiring minimal upkeep, and delivering excessive worth at low value. You possibly can prolong this basis with further metrics, diagnostic logic, or different notification channels to satisfy your group’s wants.
To study extra, see the Amazon Redshift documentation on monitoring and efficiency optimization.
Concerning the authors




