Skip to content

feat: add kubernetes execution engine#378

Open
clementblaise wants to merge 16 commits into
mainfrom
k8s-execution-engine
Open

feat: add kubernetes execution engine#378
clementblaise wants to merge 16 commits into
mainfrom
k8s-execution-engine

Conversation

@clementblaise

@clementblaise clementblaise commented May 14, 2026

Copy link
Copy Markdown
Collaborator

Summary

Introduces a Kubernetes-native execution engine as an alternative to the existing Docker Compose
backend. Instead of building and running task containers locally, the screener and validator now
submit Pods to a Kubernetes cluster where:

  1. On-demand image builds via Kaniko — Task archives are fetched from S3 at Pod start time.
    A Kaniko init container builds the image from the archive's Dockerfile and pushes it to an
    in-cluster pull-through registry (Zot). No shared PVCs, no pre-building pipeline.

  2. Proxy sidecar with iptables redirection — Each task Pod includes an init-iptables
    container (NET_ADMIN) that transparently redirects port-443 traffic to the proxy's SNI router
    on port 15443 (UID 1337 is exempt so the proxy itself can reach the internet). The proxy
    enforces cost budgets, model restrictions, and OpenRouter workspace safety policies.

  3. NetworkPolicy-based egress isolation — During the agent phase, only traffic to the proxy
    is allowed. During verification, egress is unlocked so test harnesses can reach external
    services. Phase transitions are controlled via Pod labels.

  4. Commit hash resolution in containers — The API container previously reported
    COMMIT_HASH=unknown (from the Dockerfile ARG default), causing every local screener
    registration to fail with a hash mismatch. A three-tier resolution strategy now tries:
    GIT_COMMIT env var → git rev-parse HEAD → pure-Python .git/HEAD reader. The local
    docker-compose mounts .git as a read-only volume so the container can read the actual SHA
    without needing the git binary installed.

  5. KEDA autoscaling support — A /pending-work endpoint on the API returns the count of
    queued evaluations so KEDA can scale screener StatefulSets to match demand.

  6. Kind cluster manifests — Full local development environment with Kind config, Zot
    registry, NetworkPolicies, and screener StatefulSets for testing the K8s backend without
    cloud infrastructure.

Test plan

  • Local Kind cluster: make k8s-setup → run screener against a simple task → verify proxy logs show successful OpenRouter forwarding
  • Docker Compose (regression): docker-compose up → run local screener → verify commit hash matches and registration succeeds
  • Verify /pending-work endpoint returns correct counts with queued evaluations

@clementblaise clementblaise marked this pull request as ready for review May 14, 2026 10:16
@clementblaise clementblaise force-pushed the k8s-execution-engine branch from daec9db to e93278a Compare May 15, 2026 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant