This is a CDK example project that showcases how to create an ECS Managed Instance Cluster that runs a service which consumes messages from an SQS queue.
This project demonstrates:
- Setting up an ECS cluster with GPU-enabled managed EC2 instances
- Configuring a containerized service to process SQS messages
- Infrastructure as Code using AWS CDK with TypeScript
- Cost-optimized autoscaling with scale-to-zero capability
This template includes built-in autoscaling that optimizes costs while maintaining responsiveness:
- Scale-to-Zero: When the SQS queue is empty, the service scales down to 0 tasks, eliminating costs for idle GPU instances
- Queue-Based Step Scaling: Uses step scaling policies for predictable, deterministic scaling behavior based on queue depth
- 0 messages → 0 tasks (scale down by 4, removing all tasks)
- 1-10 messages → 1 task
- 11-20 messages → 2 tasks
- 21-30 messages → 3 tasks
- 31+ messages → 4 tasks (max capacity)
- Each step adds exactly 1 task as queue depth increases
- Cooldown Period: 60 seconds between scaling actions to prevent rapid fluctuations
The container worker (example-image/src/container_worker.py) includes intelligent auto-shutdown:
- Polls SQS in batches of 10 messages
- Automatically exits after 3 consecutive empty polls (~3 seconds of idle time)
- Gracefully waits for in-progress jobs to complete before shutdown
- Triggers ECS to terminate the underlying GPU instance, saving costs
This configuration minimizes costs by:
- Scaling to zero when no work is available
- Batch processing up to 10 messages per poll cycle
- Auto-terminating idle workers after a few seconds
- Using spot instances (configured in capacity provider for up to 90% savings)
- Capping max capacity to prevent runaway costs
When messages arrive in the queue, ECS automatically provisions new GPU instances and starts tasks within a few minutes.
pnpm run buildcompile typescript to jspnpm run watchwatch for changes and compilepnpm run testperform the jest unit testspnpm cdk deploydeploy this stack to your default AWS account/regionpnpm cdk diffcompare deployed stack with current statepnpm cdk synthemits the synthesized CloudFormation template
When deploying or scaling up, you may encounter this error:
ResourceInitializationError: Unable to launch instance(s) for capacity provider.
VcpuLimitExceeded: You have requested more vCPU capacity than your current vCPU limit allows...
Cause: AWS accounts have default service quotas for GPU instances. The default limit is often 8 vCPUs for G-family instances (g4dn.*), which allows only 1-2 GPU instances. This stack's max capacity of 4 tasks requires up to 32 vCPUs.
Solution: Request a quota increase for spot GPU instances:
# Replace us-east-1 with your deployment region
aws service-quotas request-service-quota-increase \
--service-code ec2 \
--quota-code L-3819A6DF \
--desired-value 32 \
--region us-east-1Or request via the AWS Service Quotas Console:
- Navigate to EC2 service quotas
- Search for "All G and VT Spot Instance Requests"
- Request an increase to 32 vCPUs (or higher)
Quota increases are typically approved within 24-48 hours, often instantly for reasonable requests.