An xDS control plane for Envoy that dynamically discovers ECS services and provides hostname-based routing configuration.
This controller replaces Traefik's ECS provider functionality by:
- Discovering ECS services and their running tasks via the AWS ECS API
- Building Envoy configuration (listeners, routes, clusters, endpoints)
- Serving configuration to Envoy proxies via the xDS gRPC protocol
┌──────────────────────────────────────────────────────────────────┐
│ xDS Control Plane │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ ECS │────▶│ Snapshot │────▶│ xDS gRPC │ │
│ │ Discovery │ │ Builder │ │ Server │ │
│ └─────────────┘ └──────────────┘ └────────┬────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │ ECS API │ │ Envoy Proxies │ │
│ │ (AWS SDK) │ │ (gRPC clients) │ │
│ └─────────────┘ └─────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
# Build and run locally with mock services
docker-compose up --build
# Test routing (in another terminal)
curl -H 'Host: tenant-1.test.local' http://localhost:10000/
curl -H 'Host: tenant-2.test.local' http://localhost:10000/
curl -H 'Host: tenant-3.test.local' http://localhost:10000/
# View Envoy admin
open http://localhost:9901# Prerequisites: AWS CLI, Terraform, Docker
./scripts/deploy.sh deploy
# Or step by step:
./scripts/deploy.sh build # Build and push images
./scripts/deploy.sh infra # Deploy infrastructure
./scripts/deploy.sh test # Run smoke tests
./scripts/deploy.sh info # Get deployment info
# Destroy when done
./scripts/deploy.sh destroyThe project includes a GitHub Actions workflow (.github/workflows/ci-cd.yml) that:
- Build & Test - Compiles Go code and runs tests
- Docker Build - Builds and pushes images to ECR
- Deploy - Deploys infrastructure using Terraform
- Smoke Test - Validates routing works correctly
| Secret | Description |
|---|---|
AWS_ACCESS_KEY_ID |
AWS access key with ECR/ECS permissions |
AWS_SECRET_ACCESS_KEY |
AWS secret key |
- Deploy: Workflow dispatch with
deploy: true - Destroy: Workflow dispatch with
destroy: true
| Variable | Description | Default |
|---|---|---|
ECS_CLUSTER_NAME |
ECS cluster to discover services from | Required |
ECS_HOSTNAME_TAG_KEY |
Tag key containing tenant hostname | hostname |
ECS_HOSTNAME_SUFFIX |
Suffix appended to service name if no tag | - |
ECS_SERVICE_PREFIX |
Filter services by prefix | - |
ECS_CONTAINER_PORT |
Default container port | 8080 |
XDS_PORT |
gRPC server port | 18000 |
XDS_REFRESH_INTERVAL |
Discovery refresh interval | 30s |
ENVOY_LISTENER_ADDRESS |
Address Envoy should listen on | 0.0.0.0 |
ENVOY_LISTENER_PORT |
Port Envoy should listen on | 10000 |
ENVOY_NODE_ID |
Node ID for snapshot cache | envoy-proxy |
LOG_LEVEL |
Log level (debug, info, warn, error) | info |
LOG_FORMAT |
Log format (json or console) | console |
MOCK_MODE |
Enable mock discovery for local testing | false |
Tenants are mapped to hostnames in two ways:
- ECS Service Tags (preferred): Tag your ECS service with
hostname=tenant.example.com - Naming Convention: If
ECS_HOSTNAME_SUFFIXis set, the hostname is derived as{service-name}{suffix}
Example with tags:
ECS Service: tenant-123-webapp
Tags:
- hostname: tenant123.example.com
ecs-xds-controller/
├── cmd/controller/ # Main entry point
├── internal/
│ ├── config/ # Configuration loading
│ ├── ecs/ # ECS discovery (real + mock)
│ └── xds/ # xDS server and snapshot builder
├── pkg/logging/ # Structured logging
├── mock-service/ # Mock tenant service for testing
├── infrastructure/ # Terraform for AWS deployment
├── deployments/
│ └── docker/ # Dockerfile
├── envoy/ # Envoy bootstrap configs
├── scripts/ # Deployment and test scripts
├── .github/workflows/ # CI/CD pipeline
├── docker-compose.yml # Local testing setup
└── Makefile # Build commands
The Terraform code in infrastructure/ creates:
- VPC with public/private subnets
- ECR repositories for images
- ECS Cluster (Fargate)
- ALB for ingress
- Service Discovery (Cloud Map) for internal DNS
- IAM roles with least-privilege permissions
- Security groups for network isolation
Internet
│
▼
┌─────────┐
│ ALB │
└────┬────┘
│
▼
┌─────────┐ ┌───────────────┐
│ Envoy │────▶│ xDS Controller│
└────┬────┘ └───────┬───────┘
│ │
│ ▼
│ ┌───────────────┐
│ │ ECS API │
│ └───────────────┘
▼
┌─────────────────────────────────┐
│ Mock Tenant Services │
│ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │ T-1 │ │ T-2 │ │ T-3 │ │
│ └─────┘ └─────┘ └─────┘ │
└─────────────────────────────────┘
make test
make test-coverage# After deployment
./scripts/test.sh# Get ALB URL
cd infrastructure && terraform output alb_url
# Test each tenant
curl -H 'Host: tenant-1.test.local' http://<alb-url>/
curl -H 'Host: tenant-2.test.local' http://<alb-url>/
curl -H 'Host: tenant-3.test.local' http://<alb-url>/
# Test unknown host (should return 404)
curl -H 'Host: unknown.test.local' http://<alb-url>/The xDS controller requires these ECS permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecs:ListServices",
"ecs:DescribeServices",
"ecs:ListTasks",
"ecs:DescribeTasks",
"ecs:DescribeContainerInstances",
"ecs:ListTagsForResource"
],
"Resource": "*"
}
]
}# xDS Controller logs
aws logs tail /ecs/xds-controller-dev/xds-controller --follow
# Envoy logs
aws logs tail /ecs/xds-controller-dev/envoy --follow
# Mock service logs
aws logs tail /ecs/xds-controller-dev/mock-service --follow# Get Envoy task IP (from ECS console or CLI)
# Then from a bastion or VPC-connected machine:
# View clusters
curl http://<envoy-ip>:9901/clusters
# View config dump
curl http://<envoy-ip>:9901/config_dump
# View routes
curl http://<envoy-ip>:9901/config_dump?resource=dynamic_route_configs- Envoy not receiving config: Check xDS controller logs and ensure Envoy can reach the controller via service discovery
- 404 for all requests: Verify mock services are tagged with
hostnametag - Connection refused: Check security groups allow traffic flow
MIT