feat(devnet): run Kurtosis Ethereum devnets via the server#213
Open
qu0b wants to merge 14 commits into
Open
Conversation
Add a `panda devnet up/ls/inspect/down` command group for spinning up multi-client Ethereum devnets as Kurtosis enclaves on Kubernetes. Follows panda's thin-CLI/server split: the CLI dispatches devnet.* operations to the local server, which holds the Kurtosis engine connection and drives the ethpandaops ethereum-package via the Kurtosis Go SDK. The caller's identity is derived server-side (authOwnerID), never client-sent, so a future cloud-proxy rail can gate enclave creation on it. Configuration uses one cluster at a time via a top-level `cluster:` block (name + kubeconfig_context), switchable between a local and a cloud Kubernetes cluster by editing that block. Storage class / enclave size are intentionally NOT in panda config — they are engine-level Kurtosis settings fixed at engine start, so they live in kurtosis-config.yml. An optional docker_cache (e.g. docker.ethquokkaops.io) routes all package images through a pull-through cache, avoiding Docker Hub rate limits on multi-node clusters. Validated against a local k3s cluster: full up/ls/inspect/down lifecycle (~160s up, ~47s down, zero stale namespaces/PVs/PVCs), plus unit tests and a live integration test for the server handler path. Deferred (Phase 2): a cloud rail behind the proxy, which holds the cloud kubeconfig and gates enclave creation on GitHub-org membership. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
TestDevnetLifecycle_Live drives a real up -> ls -> inspect -> down through the server handlers against a live Kurtosis engine on the configured cluster. It is heavyweight (spins up a devnet) so it is skipped unless PANDA_DEVNET_LIVE=1. Verified green against the local bruno k3s cluster (215s, zero stale namespaces/PVs/PVCs after teardown). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Let the panda server boot for local development against features that don't need the production infrastructure (e.g. `panda devnet`). Previously the server hard-required a code-execution sandbox and a reachable credential proxy at startup, so a developer couldn't run it locally without that whole stack. Now four production-only dependencies degrade gracefully: - sandbox.backend: none -> no-op sandbox; code execution disabled (the sandbox is for running Python, not devnets). sandbox.image no longer required. - proxy.optional: true -> an unreachable proxy at startup is a warning, not fatal; datasource features wait for background refresh. - cartographoor startup failure is non-fatal (best-effort network metadata). - semantic search degrades to disabled when the proxy embedding is unavailable (the search service already guards nil indices). Production behaviour is unchanged: without proxy.optional the proxy and search remain fatal, and the default sandbox backend is still docker. Adds config.dev.yaml — a lean local-dev config (no sandbox, proxy optional, bruno cluster) for running `panda server` + `panda devnet` directly. Verified: the server boots with no sandbox/proxy and a full CLI up/ls/inspect/down against the local bruno k3s cluster works end to end with zero stale resources. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds 'panda devnet services <enclave>' (list running services) and 'panda devnet logs <enclave> [service...] [--tail N]' so devnet logs can be read through the panda server without shelling out to the kurtosis CLI or a local gateway — and thus through the cloud proxy when remote. Logs are read straight from the service pods via the Kubernetes API (the server already holds the kubeconfig). This fork ships container logs to OTel/ClickHouse, which leaves the engine's file-based log API empty, so the SDK GetServiceLogs path returns nothing; raw pod logs are always available and need no aggregator. The enclave namespace is resolved by the kurtosistech.com/enclave-id label (pool-claimed enclaves keep the idle enclave's kt-idle-enclave-<uuid> namespace). Non-following by design so it rides the plain request/response operation path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds '-f/--follow' to stream service logs live until Ctrl-C. Because an open-ended log stream doesn't fit the JSON operation envelope, it uses a dedicated chunked-text endpoint (GET /api/v1/devnet/logs) under the same auth group; the non-follow path keeps using the request/response operation. Server streams pod logs with Follow=true, one prefixed line at a time, flushing each so a remote viewer (through the cloud proxy) sees logs live; multiple services are followed concurrently with serialized writes. The stream stops when the client disconnects (request context cancels the upstream pod streams). Client adds a signal-cancellable streaming GET (serverStreamGet) that copies the stream to stdout until interrupted. Validated end-to-end on bruno: single- and all-service follow show tail + live lines (geth/lighthouse/vc) interleaved. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
'panda devnet services' now lists each service's in-cluster ports (rpc:8545, engine-rpc:8551, http:4000, ...) and private IP, so you can see a devnet's topology and reach client APIs immediately after 'up' without digging through kurtosis inspect. Switches Services() from GetServices to GetServiceContexts to pull port specs; adds a Port type and a Service.Endpoint(portName) helper, both exposed in --json for scripting. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds unit tests asserting devnet.services, devnet.logs and the streaming logs endpoint reject missing-enclave / bad-tail args before any engine call, and extends the live lifecycle test (PANDA_DEVNET_LIVE=1) to verify services returns the EL with an rpc endpoint and logs returns non-empty output. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ress 'panda devnet up' (when devnet.ingress.enabled) creates a Traefik Ingress per HTTP/WS service port so each is reachable at a stable, GitHub-user-scoped hostname, and 'panda devnet endpoints' lists the URLs. Hostnames are clean dotted labels: <service>.<enclave>.<owner>.<base> dora.bal3.qu0b.k3s.bruno (primary port) <port>-<service>.<enclave>.<owner>.<base> ws-el-1-geth-lighthouse.bal3.qu0b.k3s.bruno The left-most label is the only variable part below <enclave>.<owner>.<base>, so a per-enclave wildcard cert *.<enclave>.<owner>.<base> covers every host. This matches the existing ethpandaops devnet DNS/cert pattern: the zone is served by a self-hosted authoritative DNS (bruno: dnsmasq *.k3s.bruno resolves any depth; prod: NS-delegate to ethpandaops.general.dns_server) and certs come from ZeroSSL DNS-01 via cert-manager (no Let's Encrypt rate limits) — no Cloudflare upgrade. <owner> is server-derived (authOwnerID, else config local_owner) — never client-supplied — making it the multi-tenant boundary (per-user/enclave wildcard cert + forward-auth enforcing authenticated-user==owner). Exposes http/ws ports (rpc/ws/http/api/metrics); skips engine-rpc (JWT) and p2p. Reconcile is non-fatal to 'up'; ingresses live in the enclave namespace and are GC'd on 'down'. bruno->prod is config-only (base_domain/entrypoint/tls_secret/auth_middleware); no code or scheme change. Validated live on bruno: dora UI (200), EL JSON-RPC, EL WS (101 upgrade), CL beacon API served through the ingress at clean qu0b-scoped dotted hostnames. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The owner's default devnet gets short, enclave-less hostnames <service>.<owner>.<base> (e.g. dora.qu0b.k3s.bruno) in addition to the enclave-qualified ones. The newest 'panda devnet up' becomes the default; 'panda devnet use <enclave>' switches it back. Implemented as dedicated 'panda-alias-<service>' Ingresses labeled panda.devnet/alias=true; SetDefaultAlias atomically clears the owner's alias ingresses across all namespaces and recreates them in the chosen enclave, so the alias only ever resolves to one devnet. Alias hosts hang one label higher than the canonical hosts, so they use a separate per-owner wildcard cert (alias_tls_secret); empty on bruno (plain http). Owner stays server-derived. Validated live on bruno: up alpha -> dora.qu0b routes to alpha; up beta -> alias moves to beta (newest wins); use alpha -> alias moves back; exactly one alias ingress at all times. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds docs/devnet-production.md: how to take the devnet remote-access feature to prod on the ethpandaops platform, reusing platform building blocks (ArgoCD GitOps, Traefik, cert-manager DNS-01/ZeroSSL, Cloudflare/external-dns, Dex/OIDC, the hosted panda-proxy). Covers the in-cluster panda-server + GitOps Kurtosis engine, the per-host vs per-owner-wildcard DNS/TLS choice, owner enforcement via Traefik forward-auth, the two required code deltas (owner=GitHub login, cert_cluster_issuer), and rollout/validation/rollback. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…, down validation) Post-rebase adaptation onto current master: - thread *cobra.Command into runServerOperation(cmd, ...) calls (signature changed upstream); fetchEnclaves/completeEnclaveNames take cmd too - handleDevnetDown validates enclave/all BEFORE connecting to the engine, so a missing target is a 400 not a 502 (surfaced by the kurtosis SDK bump) - drop a stray gotcha.md accidentally staged during conflict resolution (serverStreamGet moved into serverclient.go where master relocated the server helpers; go.mod/go.sum reconciled via go mod tidy.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
🐼 Smoke eval —
|
| question | result | tokens | tools |
|---|---|---|---|
forky_node_coverage |
✅ | 11,876 | 2 |
tracoor_node_coverage |
✅ | 14,271 | 7 |
mainnet_block_arrival_p50 |
✅ | 16,900 | 12 |
list_datasources |
✅ | 11,530 | 2 |
block_count_24h |
✅ | 15,954 | 9 |
missed_slots_24h |
✅ | 13,874 | 4 |
🔭 Langfuse traces (6 runs; ⚠️ = failed)
The report walks this branch's commits against the master baseline and the most recent release. A self-contained copy is in the run's eval-smoke-* artifact.
Replace the Traefik-specific Entrypoint/AuthMiddleware ingress config with a generic, verbatim 'annotations' map plus a 'tls' toggle, so panda works with any ingress controller (bruno Traefik, prod ingress-nginx) and any cert/auth layer: - annotations: applied to every Ingress (controller routing, cert-manager cluster-issuer, edge auth) - tls: emit a TLS section; with tls_secret empty, derive a per-Ingress secret name so cert-manager issues per-host certs from the issuer annotation - repin kurtosis SDK to v1.18.3 (rebase tidy had bumped it to v1.19.0, which mismatches the 1.18 engine's major.minor check) Validated on bruno: ingress carries the configured traefik entrypoint annotation and dora routes 200. Docs + config.dev.yaml updated to the annotations form. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Prep for running panda-server in-cluster (platform devnet deployment): - newK8sClient() tries rest.InClusterConfig() first (prod pod ServiceAccount), falling back to a kubeconfig for local/bruno use. - resolveOwner prefers the GitHub login (authOwnerLogin) over the numeric ID so owner-scoped hostnames read dora.qu0b.<base> not dora.583231.<base>; still server-derived, never client-supplied; falls back to local_owner. Validated on bruno (local kubeconfig path unaffected: devnet up, ingress owner=qu0b, dora routes 200). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds an ingress `host_style` knob (dotted default, flat opt-in) so prod can fold <service>/<enclave>/<owner> into a single DNS label (dora--bal3--qu0b.ethpandaops.io). One label is exactly what the platform's existing *.ethpandaops.io Cloudflare tunnel + edge cert + ingress-nginx-devnets already cover, so prod needs zero new DNS/cert/tunnel components — dropping the self-hosted DNS + ZeroSSL + RFC2136 stack in favour of maintainability. Dotted stays the default for dev (bruno dnsmasq resolves any depth). Refactors the host builders into serviceHost/aliasHostname that branch on style; rewrites docs/devnet*.md for the flat prod path (edge TLS, no cert-manager issuer, auth gated at the create path + optional Cloudflare Access at the edge). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a
panda devnet up/ls/inspect/downcommand group for spinning up multi-client Ethereum devnets as Kurtosis enclaves on Kubernetes.Opening this as a checkpoint to see how far the feature is — the local rail works end-to-end; the cloud rail is scaffolded and deferred (see below).
Architecture (thin CLI → local server → Kurtosis)
devnet.*operations (runServerOperation), likeexecute/build.pkg/server/operations_devnet.go, registered inoperations_dispatch.go) and drives the ethereum-package via the Kurtosis Go SDK.authOwnerID), never client-sent — so a future cloud-proxy rail can gate on it without a client/CLI change.Configuration — one cluster, easy switch
Switch local↔cloud by editing the
cluster:block (or pointing panda at another config). On each op the server selects the kube context (EnsureKubeContext) and activates the Kurtosis cluster (EnsureCluster), then connects.No
storage_classin panda config — it's an engine-level Kurtosis setting (fixed atkurtosis engine start, not overridable per-run via the SDK), so it lives inkurtosis-config.yml, not panda.Validation
EnsureCluster, hermeticEnsureKubeContext) + a live integration test for the server handler path.go build/vet/gofmtclean.Deferred (Phase 2)
Cloud rail behind the proxy: because the local server runs on a user's own host it can't be the authorization boundary for a shared cloud cluster, so the cloud kubeconfig + GitHub-org-membership gating + owner-stamped/filtered enclaves move to the cloud proxy. The operation contract already carries server-derived identity, so it's additive. Details in
docs/devnet.md.🤖 Generated with Claude Code