Skip to content

fourTheorem/cdk-ecs-mi-template

Repository files navigation

ECS Managed Instance Template

This is a CDK example project that showcases how to create an ECS Managed Instance Cluster that runs a service which consumes messages from an SQS queue.

Overview

This project demonstrates:

  • Setting up an ECS cluster with GPU-enabled managed EC2 instances
  • Configuring a containerized service to process SQS messages
  • Infrastructure as Code using AWS CDK with TypeScript
  • Cost-optimized autoscaling with scale-to-zero capability

Auto-Scaling Configuration

This template includes built-in autoscaling that optimizes costs while maintaining responsiveness:

Scaling Behavior

  • Scale-to-Zero: When the SQS queue is empty, the service scales down to 0 tasks, eliminating costs for idle GPU instances
  • Queue-Based Step Scaling: Uses step scaling policies for predictable, deterministic scaling behavior based on queue depth
    • 0 messages → 0 tasks (scale down by 4, removing all tasks)
    • 1-10 messages → 1 task
    • 11-20 messages → 2 tasks
    • 21-30 messages → 3 tasks
    • 31+ messages → 4 tasks (max capacity)
    • Each step adds exactly 1 task as queue depth increases
  • Cooldown Period: 60 seconds between scaling actions to prevent rapid fluctuations

Worker Auto-Shutdown

The container worker (example-image/src/container_worker.py) includes intelligent auto-shutdown:

  • Polls SQS in batches of 10 messages
  • Automatically exits after 3 consecutive empty polls (~3 seconds of idle time)
  • Gracefully waits for in-progress jobs to complete before shutdown
  • Triggers ECS to terminate the underlying GPU instance, saving costs

Cost Optimization

This configuration minimizes costs by:

  1. Scaling to zero when no work is available
  2. Batch processing up to 10 messages per poll cycle
  3. Auto-terminating idle workers after a few seconds
  4. Using spot instances (configured in capacity provider for up to 90% savings)
  5. Capping max capacity to prevent runaway costs

When messages arrive in the queue, ECS automatically provisions new GPU instances and starts tasks within a few minutes.

Useful commands

  • pnpm run build compile typescript to js
  • pnpm run watch watch for changes and compile
  • pnpm run test perform the jest unit tests
  • pnpm cdk deploy deploy this stack to your default AWS account/region
  • pnpm cdk diff compare deployed stack with current state
  • pnpm cdk synth emits the synthesized CloudFormation template

Troubleshooting

GPU Instance vCPU Quota Errors

When deploying or scaling up, you may encounter this error:

ResourceInitializationError: Unable to launch instance(s) for capacity provider.
VcpuLimitExceeded: You have requested more vCPU capacity than your current vCPU limit allows...

Cause: AWS accounts have default service quotas for GPU instances. The default limit is often 8 vCPUs for G-family instances (g4dn.*), which allows only 1-2 GPU instances. This stack's max capacity of 4 tasks requires up to 32 vCPUs.

Solution: Request a quota increase for spot GPU instances:

# Replace us-east-1 with your deployment region
aws service-quotas request-service-quota-increase \
  --service-code ec2 \
  --quota-code L-3819A6DF \
  --desired-value 32 \
  --region us-east-1

Or request via the AWS Service Quotas Console:

  1. Navigate to EC2 service quotas
  2. Search for "All G and VT Spot Instance Requests"
  3. Request an increase to 32 vCPUs (or higher)

Quota increases are typically approved within 24-48 hours, often instantly for reasonable requests.

About

A CDK template showing a minimal example using ECS Managed Instances

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors