feat: add kubernetes execution engine#378
Open
clementblaise wants to merge 16 commits into
Open
Conversation
…car, and NetworkPolicy isolatio
daec9db to
e93278a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduces a Kubernetes-native execution engine as an alternative to the existing Docker Compose
backend. Instead of building and running task containers locally, the screener and validator now
submit Pods to a Kubernetes cluster where:
On-demand image builds via Kaniko — Task archives are fetched from S3 at Pod start time.
A Kaniko init container builds the image from the archive's Dockerfile and pushes it to an
in-cluster pull-through registry (Zot). No shared PVCs, no pre-building pipeline.
Proxy sidecar with iptables redirection — Each task Pod includes an
init-iptablescontainer (NET_ADMIN) that transparently redirects port-443 traffic to the proxy's SNI router
on port 15443 (UID 1337 is exempt so the proxy itself can reach the internet). The proxy
enforces cost budgets, model restrictions, and OpenRouter workspace safety policies.
NetworkPolicy-based egress isolation — During the agent phase, only traffic to the proxy
is allowed. During verification, egress is unlocked so test harnesses can reach external
services. Phase transitions are controlled via Pod labels.
Commit hash resolution in containers — The API container previously reported
COMMIT_HASH=unknown(from the Dockerfile ARG default), causing every local screenerregistration to fail with a hash mismatch. A three-tier resolution strategy now tries:
GIT_COMMITenv var →git rev-parse HEAD→ pure-Python.git/HEADreader. The localdocker-compose mounts
.gitas a read-only volume so the container can read the actual SHAwithout needing the git binary installed.
KEDA autoscaling support — A
/pending-workendpoint on the API returns the count ofqueued evaluations so KEDA can scale screener StatefulSets to match demand.
Kind cluster manifests — Full local development environment with Kind config, Zot
registry, NetworkPolicies, and screener StatefulSets for testing the K8s backend without
cloud infrastructure.
Test plan
make k8s-setup→ run screener against a simple task → verify proxy logs show successful OpenRouter forwardingdocker-compose up→ run local screener → verify commit hash matches and registration succeeds/pending-workendpoint returns correct counts with queued evaluations