ECS Managed Instance Template

This is a CDK example project that showcases how to create an ECS Managed Instance Cluster that runs a service which consumes messages from an SQS queue.

Overview

This project demonstrates:

Setting up an ECS cluster with GPU-enabled managed EC2 instances
Configuring a containerized service to process SQS messages
Infrastructure as Code using AWS CDK with TypeScript
Cost-optimized autoscaling with scale-to-zero capability

Auto-Scaling Configuration

This template includes built-in autoscaling that optimizes costs while maintaining responsiveness:

Scaling Behavior

Scale-to-Zero: When the SQS queue is empty, the service scales down to 0 tasks, eliminating costs for idle GPU instances
Queue-Based Step Scaling: Uses step scaling policies for predictable, deterministic scaling behavior based on queue depth
- 0 messages → 0 tasks (scale down by 4, removing all tasks)
- 1-10 messages → 1 task
- 11-20 messages → 2 tasks
- 21-30 messages → 3 tasks
- 31+ messages → 4 tasks (max capacity)
- Each step adds exactly 1 task as queue depth increases
Cooldown Period: 60 seconds between scaling actions to prevent rapid fluctuations

Worker Auto-Shutdown

The container worker (example-image/src/container_worker.py) includes intelligent auto-shutdown:

Polls SQS in batches of 10 messages
Automatically exits after 3 consecutive empty polls (~3 seconds of idle time)
Gracefully waits for in-progress jobs to complete before shutdown
Triggers ECS to terminate the underlying GPU instance, saving costs

Cost Optimization

This configuration minimizes costs by:

Scaling to zero when no work is available
Batch processing up to 10 messages per poll cycle
Auto-terminating idle workers after a few seconds
Using spot instances (configured in capacity provider for up to 90% savings)
Capping max capacity to prevent runaway costs

When messages arrive in the queue, ECS automatically provisions new GPU instances and starts tasks within a few minutes.

Useful commands

pnpm run build compile typescript to js
pnpm run watch watch for changes and compile
pnpm run test perform the jest unit tests
pnpm cdk deploy deploy this stack to your default AWS account/region
pnpm cdk diff compare deployed stack with current state
pnpm cdk synth emits the synthesized CloudFormation template

Troubleshooting

GPU Instance vCPU Quota Errors

When deploying or scaling up, you may encounter this error:

ResourceInitializationError: Unable to launch instance(s) for capacity provider.
VcpuLimitExceeded: You have requested more vCPU capacity than your current vCPU limit allows...

Cause: AWS accounts have default service quotas for GPU instances. The default limit is often 8 vCPUs for G-family instances (g4dn.*), which allows only 1-2 GPU instances. This stack's max capacity of 4 tasks requires up to 32 vCPUs.

Solution: Request a quota increase for spot GPU instances:

# Replace us-east-1 with your deployment region
aws service-quotas request-service-quota-increase \
  --service-code ec2 \
  --quota-code L-3819A6DF \
  --desired-value 32 \
  --region us-east-1

Or request via the AWS Service Quotas Console:

Navigate to EC2 service quotas
Search for "All G and VT Spot Instance Requests"
Request an increase to 32 vCPUs (or higher)

Quota increases are typically approved within 24-48 hours, often instantly for reasonable requests.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.claude		.claude
.serena		.serena
.vscode		.vscode
bin		bin
example-image		example-image
lib		lib
test		test
whisper-image		whisper-image
.gitignore		.gitignore
.mcp.json		.mcp.json
.npmignore		.npmignore
CLAUDE.md		CLAUDE.md
README.md		README.md
biome.json		biome.json
cdk.json		cdk.json
jest.config.js		jest.config.js
lefthook.yml		lefthook.yml
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
pyrightconfig.json		pyrightconfig.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECS Managed Instance Template

Overview

Auto-Scaling Configuration

Scaling Behavior

Worker Auto-Shutdown

Cost Optimization

Useful commands

Troubleshooting

GPU Instance vCPU Quota Errors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ECS Managed Instance Template

Overview

Auto-Scaling Configuration

Scaling Behavior

Worker Auto-Shutdown

Cost Optimization

Useful commands

Troubleshooting

GPU Instance vCPU Quota Errors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages