Skip to content

feat(rtmg): accept register_user_lora WS message for user-trained packs#297

Open
hthillman wants to merge 1 commit into
mainfrom
claude/style-pack-user-loras
Open

feat(rtmg): accept register_user_lora WS message for user-trained packs#297
hthillman wants to merge 1 commit into
mainfrom
claude/style-pack-user-loras

Conversation

@hthillman

Copy link
Copy Markdown
Collaborator

Summary

Closes the loop between the pipelines registry (live in livepeer/pipelines#2693) and the rtmg pod: a connected client sends a register_user_lora WS message carrying presigned Tigris URLs + the expected Ed25519 signing key id and sha256 digest. The pod downloads, verifies, and registers the pack so the next enable_lora picks it up.

The rtmg-vst follow-up PR will send these messages from the JUCE plugin. This PR just lands the receive side so the wire contract is reviewable and the pod is ready to accept the new traffic.

Trust model

Matches what the orchestrator writes at training time (see pipelines#2693 + demon-public-demo#407):

  • Canonical-JSON manifest { v:1, jobId, style, trigger, sha256, createdAt } signed with Ed25519.
  • Sidecar <style>.signature.json carries { manifest_b64, sig_b64, kid }.
  • Pod fetches its trusted-keys roster from https://app.daydream.live/api/loras/signing-public-key at module init (5s timeout), with a LORA_SIGNING_PUBLIC_KEYS_PEM env fallback for dev / offline boots. Cached for the process lifetime — rotation requires a pod restart.

Verify chain inside verify_pack:

  1. kid in trusted set
  2. Recomputed sha256 of safetensors bytes matches manifest.sha256 and the expected_sha256 from the WS body
  3. Ed25519PublicKey.verify(signature, manifest_bytes) succeeds

Any failure → no catalog mutation, {type:"error", code:"register_user_lora_*", message:...} to the calling client.

Files

File Change
demos/realtime_motion_graph_web/user_loras.py new — pure helpers: download_pack, verify_pack, materialize_pack. Trusted-keys cache.
demos/realtime_motion_graph_web/ws_adapter.py New dispatch elif next to enable_lora. Heavy I/O fires on a small ThreadPoolExecutor so the WS receive loop keeps consuming frames during the ~200MB download.
demos/realtime_motion_graph_web/protocol.py register_user_lora CommandSpec so the command name passes the COMMAND_NAMES gate and the wire contract is in the published surface.
acestep/streaming/session.py Session.register_user_lora(path, name) calls engine_obj.register_lora (idempotent on filename stem) and publishes LoraCatalogUpdate to refresh every WS subscriber.
acestep/paths.py user_loras_dir() reads ACESTEP_USER_LORAS_DIR, defaults to models_dir()/user_loras (separate from the read-only baked catalog).
acestep/lora_metadata.py LoraMetadata.source field. None for stock, "user_pack" for runtime-registered. Surfaced into the metadata block on each lora_catalog entry so UIs can section the dropdown.
pyproject.toml Explicit cryptography>=42 dep (likely transitive via huggingface_hub today; declaring it so a clean resolver run doesn't surprise us).

Env required on the pod

Variable Purpose
LORA_PUBLIC_KEY_REGISTRY_URL Override the default app.daydream.live/api/loras/signing-public-key endpoint. Optional.
LORA_SIGNING_PUBLIC_KEYS_PEM Newline-separated kid<TAB>PEM pairs, blocks separated by blank lines. Fallback when the registry fetch fails.
ACESTEP_USER_LORAS_DIR Writable dir for materialized user packs. Defaults to $ACESTEP_MODELS_DIR/user_loras. Mount on persistent storage if you want packs to survive pod restarts.

Test plan

  • Boot pod with no LORA_SIGNING_PUBLIC_KEYS_PEM env and the registry URL reachable → register_user_lora accepts a valid pack.
  • Boot pod with the registry URL unreachable + env populated → fallback path works; same pack accepted.
  • Tamper test: flip a byte in the safetensors before the pod's download (impossible end-to-end given Tigris immutability; simulate by pointing the URL at a different file). Pod rejects with code:"verify_failed".
  • Forge test: sign a manifest with an unrelated Ed25519 key, post as sidecar. Pod's kid lookup misses → code:"verify_failed".
  • Idempotency: send the same register_user_lora twice → second is a no-op, catalog event still fires.
  • Catalog event after register includes the new entry with metadata.source == "user_pack".

Out of scope

  • VST-side register_user_lora sender — separate PR against rtmg-vst.
  • User-pack deletion. No WS verb yet; an operator can rm from user_loras_dir and restart for now.
  • Cold-start cap: today the pod accepts arbitrary numbers of register_user_lora per session. A future PR may rate-limit or queue.

🤖 Generated with Claude Code

Closes the loop between the pipelines registry (live in
livepeer/pipelines#2693) and the rtmg pod: a connected client (the
rtmg-vst plugin in a follow-up PR) sends a register_user_lora frame
carrying presigned Tigris URLs + the expected Ed25519 signing key id
and sha256 digest, and the pod downloads, verifies, and registers the
pack so the next enable_lora picks it up.

The trust chain matches what the orchestrator writes at training time:
canonical-JSON manifest signed with Ed25519, sidecar carries
{ manifest_b64, sig_b64, kid }. The pod fetches its trusted-keys roster
from app.daydream.live/api/loras/signing-public-key at module init,
with a LORA_SIGNING_PUBLIC_KEYS_PEM env fallback for dev / offline
boots.

Files:
- demos/realtime_motion_graph_web/user_loras.py (new) — pure helpers:
  download_pack, verify_pack (Ed25519 + sha256 + kid trust check),
  materialize_pack (writes safetensors + metadata.json + trigger.txt
  into user_loras_dir, atomic rename).
- demos/realtime_motion_graph_web/ws_adapter.py — new dispatch elif
  next to enable_lora. Heavy I/O fires on a small ThreadPoolExecutor
  so the WS receive loop keeps consuming frames during the ~200 MB
  download. Errors surface as {type:"error", code:"register_user_lora_*"}.
- demos/realtime_motion_graph_web/protocol.py — register_user_lora
  CommandSpec so the command name passes the COMMAND_NAMES gate and
  the wire contract is in the published surface.
- acestep/streaming/session.py — Session.register_user_lora(path, name)
  calls engine_obj.register_lora (idempotent on stem) and publishes
  LoraCatalogUpdate to refresh every WS subscriber.
- acestep/paths.py — user_loras_dir() reads ACESTEP_USER_LORAS_DIR env,
  defaults to models_dir()/user_loras (separate from the read-only
  baked catalog so operators can mount persistent storage there).
- acestep/lora_metadata.py — LoraMetadata.source field (None for
  stock, "user_pack" for runtime-registered). user_loras writes
  source="user_pack" into the sidecar so the catalog event carries it
  through to UI clients that want to render "My Styles" vs "Stock
  Styles" sections.
- pyproject.toml — explicit cryptography>=42 dep (likely transitive
  via huggingface_hub today; declaring it so a clean resolver run
  doesn't surprise us).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant