Skip to content

feat(devnet): run Kurtosis Ethereum devnets via the server#213

Open
qu0b wants to merge 14 commits into
masterfrom
qu0b/devnet
Open

feat(devnet): run Kurtosis Ethereum devnets via the server#213
qu0b wants to merge 14 commits into
masterfrom
qu0b/devnet

Conversation

@qu0b

@qu0b qu0b commented Jun 12, 2026

Copy link
Copy Markdown
Member

What

Adds a panda devnet up/ls/inspect/down command group for spinning up multi-client Ethereum devnets as Kurtosis enclaves on Kubernetes.

Opening this as a checkpoint to see how far the feature is — the local rail works end-to-end; the cloud rail is scaffolded and deferred (see below).

Architecture (thin CLI → local server → Kurtosis)

panda CLI  ──HTTP op──▶  panda server (local)  ──Kurtosis SDK──▶  Kurtosis engine ──▶ k8s
  • The CLI is thin: dispatches devnet.* operations (runServerOperation), like execute/build.
  • The server holds the Kurtosis connection (pkg/server/operations_devnet.go, registered in operations_dispatch.go) and drives the ethereum-package via the Kurtosis Go SDK.
  • The caller's identity is derived server-side (authOwnerID), never client-sent — so a future cloud-proxy rail can gate on it without a client/CLI change.

Configuration — one cluster, easy switch

cluster:                      # top-level (shared resource); define only the one in use
  name: bruno                 # Kurtosis cluster to activate
  kubeconfig_context: bruno   # kube context Kurtosis connects through
devnet:
  package: github.com/ethpandaops/ethereum-package
  docker_cache: docker.ethquokkaops.io   # pull-through cache; avoids Docker Hub rate limits

Switch local↔cloud by editing the cluster: block (or pointing panda at another config). On each op the server selects the kube context (EnsureKubeContext) and activates the Kurtosis cluster (EnsureCluster), then connects.

No storage_class in panda config — it's an engine-level Kurtosis setting (fixed at kurtosis engine start, not overridable per-run via the SDK), so it lives in kurtosis-config.yml, not panda.

Validation

  • Local k3s lifecycle: full up/ls/inspect/down — ~160s up, ~47s down, zero stale namespaces/PVs/PVCs across repeated cycles.
  • Real finalizing devnet (geth + lighthouse) confirmed.
  • Unit tests (dispatch guards, docker_cache injection, EnsureCluster, hermetic EnsureKubeContext) + a live integration test for the server handler path. go build/vet/gofmt clean.

Deferred (Phase 2)

Cloud rail behind the proxy: because the local server runs on a user's own host it can't be the authorization boundary for a shared cloud cluster, so the cloud kubeconfig + GitHub-org-membership gating + owner-stamped/filtered enclaves move to the cloud proxy. The operation contract already carries server-derived identity, so it's additive. Details in docs/devnet.md.

🤖 Generated with Claude Code

qu0b and others added 11 commits June 15, 2026 13:21
Add a `panda devnet up/ls/inspect/down` command group for spinning up
multi-client Ethereum devnets as Kurtosis enclaves on Kubernetes.

Follows panda's thin-CLI/server split: the CLI dispatches devnet.* operations
to the local server, which holds the Kurtosis engine connection and drives the
ethpandaops ethereum-package via the Kurtosis Go SDK. The caller's identity is
derived server-side (authOwnerID), never client-sent, so a future cloud-proxy
rail can gate enclave creation on it.

Configuration uses one cluster at a time via a top-level `cluster:` block
(name + kubeconfig_context), switchable between a local and a cloud Kubernetes
cluster by editing that block. Storage class / enclave size are intentionally
NOT in panda config — they are engine-level Kurtosis settings fixed at engine
start, so they live in kurtosis-config.yml.

An optional docker_cache (e.g. docker.ethquokkaops.io) routes all package images
through a pull-through cache, avoiding Docker Hub rate limits on multi-node
clusters.

Validated against a local k3s cluster: full up/ls/inspect/down lifecycle
(~160s up, ~47s down, zero stale namespaces/PVs/PVCs), plus unit tests and a
live integration test for the server handler path.

Deferred (Phase 2): a cloud rail behind the proxy, which holds the cloud
kubeconfig and gates enclave creation on GitHub-org membership.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
TestDevnetLifecycle_Live drives a real up -> ls -> inspect -> down through the
server handlers against a live Kurtosis engine on the configured cluster. It is
heavyweight (spins up a devnet) so it is skipped unless PANDA_DEVNET_LIVE=1.

Verified green against the local bruno k3s cluster (215s, zero stale
namespaces/PVs/PVCs after teardown).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Let the panda server boot for local development against features that don't
need the production infrastructure (e.g. `panda devnet`). Previously the server
hard-required a code-execution sandbox and a reachable credential proxy at
startup, so a developer couldn't run it locally without that whole stack.

Now four production-only dependencies degrade gracefully:

- sandbox.backend: none  -> no-op sandbox; code execution disabled (the sandbox
  is for running Python, not devnets). sandbox.image no longer required.
- proxy.optional: true   -> an unreachable proxy at startup is a warning, not
  fatal; datasource features wait for background refresh.
- cartographoor startup failure is non-fatal (best-effort network metadata).
- semantic search degrades to disabled when the proxy embedding is unavailable
  (the search service already guards nil indices).

Production behaviour is unchanged: without proxy.optional the proxy and search
remain fatal, and the default sandbox backend is still docker.

Adds config.dev.yaml — a lean local-dev config (no sandbox, proxy optional,
bruno cluster) for running `panda server` + `panda devnet` directly. Verified:
the server boots with no sandbox/proxy and a full CLI up/ls/inspect/down against
the local bruno k3s cluster works end to end with zero stale resources.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds 'panda devnet services <enclave>' (list running services) and
'panda devnet logs <enclave> [service...] [--tail N]' so devnet logs can be
read through the panda server without shelling out to the kurtosis CLI or a
local gateway — and thus through the cloud proxy when remote.

Logs are read straight from the service pods via the Kubernetes API (the server
already holds the kubeconfig). This fork ships container logs to OTel/ClickHouse,
which leaves the engine's file-based log API empty, so the SDK GetServiceLogs
path returns nothing; raw pod logs are always available and need no aggregator.
The enclave namespace is resolved by the kurtosistech.com/enclave-id label
(pool-claimed enclaves keep the idle enclave's kt-idle-enclave-<uuid> namespace).

Non-following by design so it rides the plain request/response operation path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds '-f/--follow' to stream service logs live until Ctrl-C. Because an
open-ended log stream doesn't fit the JSON operation envelope, it uses a
dedicated chunked-text endpoint (GET /api/v1/devnet/logs) under the same auth
group; the non-follow path keeps using the request/response operation.

Server streams pod logs with Follow=true, one prefixed line at a time, flushing
each so a remote viewer (through the cloud proxy) sees logs live; multiple
services are followed concurrently with serialized writes. The stream stops when
the client disconnects (request context cancels the upstream pod streams).

Client adds a signal-cancellable streaming GET (serverStreamGet) that copies the
stream to stdout until interrupted.

Validated end-to-end on bruno: single- and all-service follow show tail + live
lines (geth/lighthouse/vc) interleaved.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
'panda devnet services' now lists each service's in-cluster ports (rpc:8545,
engine-rpc:8551, http:4000, ...) and private IP, so you can see a devnet's
topology and reach client APIs immediately after 'up' without digging through
kurtosis inspect. Switches Services() from GetServices to GetServiceContexts to
pull port specs; adds a Port type and a Service.Endpoint(portName) helper, both
exposed in --json for scripting.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds unit tests asserting devnet.services, devnet.logs and the streaming
logs endpoint reject missing-enclave / bad-tail args before any engine call,
and extends the live lifecycle test (PANDA_DEVNET_LIVE=1) to verify services
returns the EL with an rpc endpoint and logs returns non-empty output.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ress

'panda devnet up' (when devnet.ingress.enabled) creates a Traefik Ingress per
HTTP/WS service port so each is reachable at a stable, GitHub-user-scoped
hostname, and 'panda devnet endpoints' lists the URLs.

Hostnames are clean dotted labels:
  <service>.<enclave>.<owner>.<base>           dora.bal3.qu0b.k3s.bruno   (primary port)
  <port>-<service>.<enclave>.<owner>.<base>    ws-el-1-geth-lighthouse.bal3.qu0b.k3s.bruno
The left-most label is the only variable part below <enclave>.<owner>.<base>, so
a per-enclave wildcard cert *.<enclave>.<owner>.<base> covers every host. This
matches the existing ethpandaops devnet DNS/cert pattern: the zone is served by a
self-hosted authoritative DNS (bruno: dnsmasq *.k3s.bruno resolves any depth;
prod: NS-delegate to ethpandaops.general.dns_server) and certs come from ZeroSSL
DNS-01 via cert-manager (no Let's Encrypt rate limits) — no Cloudflare upgrade.

<owner> is server-derived (authOwnerID, else config local_owner) — never
client-supplied — making it the multi-tenant boundary (per-user/enclave wildcard
cert + forward-auth enforcing authenticated-user==owner). Exposes http/ws ports
(rpc/ws/http/api/metrics); skips engine-rpc (JWT) and p2p. Reconcile is non-fatal
to 'up'; ingresses live in the enclave namespace and are GC'd on 'down'.

bruno->prod is config-only (base_domain/entrypoint/tls_secret/auth_middleware);
no code or scheme change.

Validated live on bruno: dora UI (200), EL JSON-RPC, EL WS (101 upgrade), CL
beacon API served through the ingress at clean qu0b-scoped dotted hostnames.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The owner's default devnet gets short, enclave-less hostnames
<service>.<owner>.<base> (e.g. dora.qu0b.k3s.bruno) in addition to the
enclave-qualified ones. The newest 'panda devnet up' becomes the default;
'panda devnet use <enclave>' switches it back.

Implemented as dedicated 'panda-alias-<service>' Ingresses labeled
panda.devnet/alias=true; SetDefaultAlias atomically clears the owner's alias
ingresses across all namespaces and recreates them in the chosen enclave, so the
alias only ever resolves to one devnet. Alias hosts hang one label higher than
the canonical hosts, so they use a separate per-owner wildcard cert
(alias_tls_secret); empty on bruno (plain http). Owner stays server-derived.

Validated live on bruno: up alpha -> dora.qu0b routes to alpha; up beta -> alias
moves to beta (newest wins); use alpha -> alias moves back; exactly one alias
ingress at all times.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds docs/devnet-production.md: how to take the devnet remote-access feature to
prod on the ethpandaops platform, reusing platform building blocks (ArgoCD
GitOps, Traefik, cert-manager DNS-01/ZeroSSL, Cloudflare/external-dns, Dex/OIDC,
the hosted panda-proxy). Covers the in-cluster panda-server + GitOps Kurtosis
engine, the per-host vs per-owner-wildcard DNS/TLS choice, owner enforcement via
Traefik forward-auth, the two required code deltas (owner=GitHub login,
cert_cluster_issuer), and rollout/validation/rollback.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…, down validation)

Post-rebase adaptation onto current master:
- thread *cobra.Command into runServerOperation(cmd, ...) calls (signature changed
  upstream); fetchEnclaves/completeEnclaveNames take cmd too
- handleDevnetDown validates enclave/all BEFORE connecting to the engine, so a
  missing target is a 400 not a 502 (surfaced by the kurtosis SDK bump)
- drop a stray gotcha.md accidentally staged during conflict resolution

(serverStreamGet moved into serverclient.go where master relocated the server
helpers; go.mod/go.sum reconciled via go mod tidy.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

🐼 Smoke eval — 479758e: ✅ 6/6 pass

📊 Interactive report — tokens p50 14,072 · tokens/solve 14,068.

Reference points: qu0b/devnet@f95248f 100% · qu0b/devnet@9e3fb2f 100% · qu0b/devnet@2587236 100%.

question result tokens tools
forky_node_coverage 11,876 2
tracoor_node_coverage 14,271 7
mainnet_block_arrival_p50 16,900 12
list_datasources 11,530 2
block_count_24h 15,954 9
missed_slots_24h 13,874 4
🔭 Langfuse traces (6 runs; ⚠️ = failed)

The report walks this branch's commits against the master baseline and the most recent release. A self-contained copy is in the run's eval-smoke-* artifact.

qu0b and others added 3 commits June 15, 2026 14:04
Replace the Traefik-specific Entrypoint/AuthMiddleware ingress config with a
generic, verbatim 'annotations' map plus a 'tls' toggle, so panda works with any
ingress controller (bruno Traefik, prod ingress-nginx) and any cert/auth layer:
- annotations: applied to every Ingress (controller routing, cert-manager
  cluster-issuer, edge auth)
- tls: emit a TLS section; with tls_secret empty, derive a per-Ingress secret
  name so cert-manager issues per-host certs from the issuer annotation
- repin kurtosis SDK to v1.18.3 (rebase tidy had bumped it to v1.19.0, which
  mismatches the 1.18 engine's major.minor check)

Validated on bruno: ingress carries the configured traefik entrypoint annotation
and dora routes 200. Docs + config.dev.yaml updated to the annotations form.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Prep for running panda-server in-cluster (platform devnet deployment):
- newK8sClient() tries rest.InClusterConfig() first (prod pod ServiceAccount),
  falling back to a kubeconfig for local/bruno use.
- resolveOwner prefers the GitHub login (authOwnerLogin) over the numeric ID so
  owner-scoped hostnames read dora.qu0b.<base> not dora.583231.<base>; still
  server-derived, never client-supplied; falls back to local_owner.

Validated on bruno (local kubeconfig path unaffected: devnet up, ingress
owner=qu0b, dora routes 200).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds an ingress `host_style` knob (dotted default, flat opt-in) so prod can
fold <service>/<enclave>/<owner> into a single DNS label
(dora--bal3--qu0b.ethpandaops.io). One label is exactly what the platform's
existing *.ethpandaops.io Cloudflare tunnel + edge cert + ingress-nginx-devnets
already cover, so prod needs zero new DNS/cert/tunnel components — dropping the
self-hosted DNS + ZeroSSL + RFC2136 stack in favour of maintainability.

Dotted stays the default for dev (bruno dnsmasq resolves any depth). Refactors
the host builders into serviceHost/aliasHostname that branch on style; rewrites
docs/devnet*.md for the flat prod path (edge TLS, no cert-manager issuer, auth
gated at the create path + optional Cloudflare Access at the edge).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant