crew.aws.batch for province-wide habitat pipeline
Vision
Replace local parallel execution with crew.aws.batch workers hitting an RDS PostgreSQL instance. 350 WSGs × 7 species = 2,450 tasks across 50 workers = ~2 minutes for the entire province (vs ~9 hours sequential local).
Architecture
Controller (laptop or EC2)
└── crew.aws.batch launcher
├── Worker 1 (Batch) ──→ RDS PostgreSQL (fwapg)
├── Worker 2 (Batch) ──→ ...
└── Worker 50 (Batch) ──→ RDS PostgreSQL (fwapg)
What fresh needs
frs_habitat() already supports workers param — swap mirai for crew controller
- Workers need
frs_db_conn() params passed via crew's data argument
- Docker image: current
docker/Dockerfile + R + fresh installed → push to ECR
break_sources tables live in RDS, all workers read them
What awshak needs
- RDS/Aurora PostgreSQL with fwapg loaded
- Batch compute environment (Fargate or EC2)
- ECR repository for the worker Docker image
- VPC, security groups, IAM roles
- See NewGraphEnvironment/awshak#64
Cost estimate
- 50 workers × 2 min = 100 vCPU-minutes ≈ $0.50/run (Batch)
- RDS db.r6g.xlarge (~$0.40/hr) — shared infrastructure
Prerequisites
- mirai integration in fresh (this comes first)
- awshak infrastructure (RDS, Batch, ECR)
Blocked by
- awshak infrastructure issue (to be filed)
crew.aws.batch for province-wide habitat pipeline
Vision
Replace local parallel execution with crew.aws.batch workers hitting an RDS PostgreSQL instance. 350 WSGs × 7 species = 2,450 tasks across 50 workers = ~2 minutes for the entire province (vs ~9 hours sequential local).
Architecture
What fresh needs
frs_habitat()already supportsworkersparam — swap mirai for crew controllerfrs_db_conn()params passed via crew'sdataargumentdocker/Dockerfile+ R + fresh installed → push to ECRbreak_sourcestables live in RDS, all workers read themWhat awshak needs
Cost estimate
Prerequisites
Blocked by