Skip to content

jicowan/ecs-xds-controller

Repository files navigation

ECS xDS Controller

An xDS control plane for Envoy that dynamically discovers ECS services and provides hostname-based routing configuration.

Overview

This controller replaces Traefik's ECS provider functionality by:

  1. Discovering ECS services and their running tasks via the AWS ECS API
  2. Building Envoy configuration (listeners, routes, clusters, endpoints)
  3. Serving configuration to Envoy proxies via the xDS gRPC protocol

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                    xDS Control Plane                             │
│                                                                  │
│  ┌─────────────┐     ┌──────────────┐     ┌─────────────────┐    │
│  │ ECS         │────▶│ Snapshot     │────▶│ xDS gRPC        │    │
│  │ Discovery   │     │ Builder      │     │ Server          │    │
│  └─────────────┘     └──────────────┘     └────────┬────────┘    │
│        │                                           │             │
│        ▼                                           ▼             │
│  ┌─────────────┐                          ┌─────────────────┐    │
│  │ ECS API     │                          │ Envoy Proxies   │    │
│  │ (AWS SDK)   │                          │ (gRPC clients)  │    │
│  └─────────────┘                          └─────────────────┘    │
└──────────────────────────────────────────────────────────────────┘

Quick Start

Local Testing (Docker Compose)

# Build and run locally with mock services
docker-compose up --build

# Test routing (in another terminal)
curl -H 'Host: tenant-1.test.local' http://localhost:10000/
curl -H 'Host: tenant-2.test.local' http://localhost:10000/
curl -H 'Host: tenant-3.test.local' http://localhost:10000/

# View Envoy admin
open http://localhost:9901

Deploy to AWS

# Prerequisites: AWS CLI, Terraform, Docker
./scripts/deploy.sh deploy

# Or step by step:
./scripts/deploy.sh build    # Build and push images
./scripts/deploy.sh infra    # Deploy infrastructure
./scripts/deploy.sh test     # Run smoke tests
./scripts/deploy.sh info     # Get deployment info

# Destroy when done
./scripts/deploy.sh destroy

CI/CD Pipeline

The project includes a GitHub Actions workflow (.github/workflows/ci-cd.yml) that:

  1. Build & Test - Compiles Go code and runs tests
  2. Docker Build - Builds and pushes images to ECR
  3. Deploy - Deploys infrastructure using Terraform
  4. Smoke Test - Validates routing works correctly

Required GitHub Secrets

Secret Description
AWS_ACCESS_KEY_ID AWS access key with ECR/ECS permissions
AWS_SECRET_ACCESS_KEY AWS secret key

Manual Triggers

  • Deploy: Workflow dispatch with deploy: true
  • Destroy: Workflow dispatch with destroy: true

Configuration

Environment Variables

Variable Description Default
ECS_CLUSTER_NAME ECS cluster to discover services from Required
ECS_HOSTNAME_TAG_KEY Tag key containing tenant hostname hostname
ECS_HOSTNAME_SUFFIX Suffix appended to service name if no tag -
ECS_SERVICE_PREFIX Filter services by prefix -
ECS_CONTAINER_PORT Default container port 8080
XDS_PORT gRPC server port 18000
XDS_REFRESH_INTERVAL Discovery refresh interval 30s
ENVOY_LISTENER_ADDRESS Address Envoy should listen on 0.0.0.0
ENVOY_LISTENER_PORT Port Envoy should listen on 10000
ENVOY_NODE_ID Node ID for snapshot cache envoy-proxy
LOG_LEVEL Log level (debug, info, warn, error) info
LOG_FORMAT Log format (json or console) console
MOCK_MODE Enable mock discovery for local testing false

Hostname Resolution

Tenants are mapped to hostnames in two ways:

  1. ECS Service Tags (preferred): Tag your ECS service with hostname=tenant.example.com
  2. Naming Convention: If ECS_HOSTNAME_SUFFIX is set, the hostname is derived as {service-name}{suffix}

Example with tags:

ECS Service: tenant-123-webapp
Tags:
  - hostname: tenant123.example.com

Project Structure

ecs-xds-controller/
├── cmd/controller/          # Main entry point
├── internal/
│   ├── config/              # Configuration loading
│   ├── ecs/                 # ECS discovery (real + mock)
│   └── xds/                 # xDS server and snapshot builder
├── pkg/logging/             # Structured logging
├── mock-service/            # Mock tenant service for testing
├── infrastructure/          # Terraform for AWS deployment
├── deployments/
│   └── docker/              # Dockerfile
├── envoy/                   # Envoy bootstrap configs
├── scripts/                 # Deployment and test scripts
├── .github/workflows/       # CI/CD pipeline
├── docker-compose.yml       # Local testing setup
└── Makefile                 # Build commands

Infrastructure

The Terraform code in infrastructure/ creates:

  • VPC with public/private subnets
  • ECR repositories for images
  • ECS Cluster (Fargate)
  • ALB for ingress
  • Service Discovery (Cloud Map) for internal DNS
  • IAM roles with least-privilege permissions
  • Security groups for network isolation

Architecture Diagram

Internet
    │
    ▼
┌─────────┐
│   ALB   │
└────┬────┘
     │
     ▼
┌─────────┐     ┌───────────────┐
│  Envoy  │────▶│ xDS Controller│
└────┬────┘     └───────┬───────┘
     │                  │
     │                  ▼
     │          ┌───────────────┐
     │          │   ECS API     │
     │          └───────────────┘
     ▼
┌─────────────────────────────────┐
│     Mock Tenant Services        │
│  ┌─────┐  ┌─────┐  ┌─────┐      │
│  │ T-1 │  │ T-2 │  │ T-3 │      │
│  └─────┘  └─────┘  └─────┘      │
└─────────────────────────────────┘

Testing

Unit Tests

make test
make test-coverage

Integration Tests

# After deployment
./scripts/test.sh

Manual Testing

# Get ALB URL
cd infrastructure && terraform output alb_url

# Test each tenant
curl -H 'Host: tenant-1.test.local' http://<alb-url>/
curl -H 'Host: tenant-2.test.local' http://<alb-url>/
curl -H 'Host: tenant-3.test.local' http://<alb-url>/

# Test unknown host (should return 404)
curl -H 'Host: unknown.test.local' http://<alb-url>/

IAM Permissions

The xDS controller requires these ECS permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecs:ListServices",
        "ecs:DescribeServices",
        "ecs:ListTasks",
        "ecs:DescribeTasks",
        "ecs:DescribeContainerInstances",
        "ecs:ListTagsForResource"
      ],
      "Resource": "*"
    }
  ]
}

Troubleshooting

View Logs

# xDS Controller logs
aws logs tail /ecs/xds-controller-dev/xds-controller --follow

# Envoy logs
aws logs tail /ecs/xds-controller-dev/envoy --follow

# Mock service logs
aws logs tail /ecs/xds-controller-dev/mock-service --follow

Check Envoy Configuration

# Get Envoy task IP (from ECS console or CLI)
# Then from a bastion or VPC-connected machine:

# View clusters
curl http://<envoy-ip>:9901/clusters

# View config dump
curl http://<envoy-ip>:9901/config_dump

# View routes
curl http://<envoy-ip>:9901/config_dump?resource=dynamic_route_configs

Common Issues

  1. Envoy not receiving config: Check xDS controller logs and ensure Envoy can reach the controller via service discovery
  2. 404 for all requests: Verify mock services are tagged with hostname tag
  3. Connection refused: Check security groups allow traffic flow

License

MIT

About

An xDS control plane for Envoy that dynamically discovers ECS services and provides hostname-based routing configuration

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors