AgentDeck

Stop Chatting. Start Steering.

AgentDeck is a physical control surface for AI coding agents. It started with an Elgato Stream Deck+ and now runs on 13 display surfaces simultaneously — tablets, e-ink readers, phones, ESP32 modules, LED matrices, HID decks, and terminals.

One bridge. 13 surfaces. Steer your AI — without leaving your keyboard flow.

Independent project. Not affiliated with Anthropic, OpenAI, Google, Elgato, DIVOOM, or other third parties referenced. All trademarks are property of their respective owners. See ATTRIBUTION.md for full notices.

Watch Demo on YouTube

	Requirement
Platform	macOS 14+ (Sonoma) — Windows/Linux not supported
Hardware	Elgato Stream Deck+ (8 keys, 4 encoders, LCD touch strip)
Terminal	iTerm2 (required for session management and voice paste)
Android	(Optional) Android 10+ tablet or e-ink reader for remote dashboard
Apple	(Optional) iOS 17+ / iPadOS 17+ / macOS 14+ for SwiftUI dashboard
TUI	(Optional) Any terminal with truecolor support for `agentdeck dashboard`

What is AgentDeck?
Prerequisites
Quick Start
Manual Build & Install
Usage
- CLI Reference
Stream Deck+ Layout (v4)
Ulanzi D200H Deck Dock
Android Dashboard
Apple Dashboard
TUI Dashboard
Agent Performance Evaluation & Model Orchestration
ESP32 Display
Ulanzi TC001 LED Matrix
Pixoo64 LED Matrix
Configuration
Troubleshooting
Uninstall
Development
Next Milestones — Current Focus
Roadmap

Documentation

Architecture — monorepo layout, BridgeCore, PtyAdapter, AgentAdapter, Gateway protocol
Daemon — daemon hub, singleton guard, mDNS recovery, usage relay, multi-surface monitoring
Plugin Conventions — encoder LCD, wide canvas, OC timeline pipeline, D200H HID, sleep/wake
v4 Layout — v4 Session-Per-Button keypad + encoder mapping, v3→v4 changes
Stream Deck+ Layout Reference — per-state layouts, encoder details, button colors
TUI Dashboard — terrarium, sprites, adaptive layouts
Android Reference — device support, build/signing, creature behavior
Android UI/UX Vision — e-ink + tablet layouts, creatures, refresh zones
ESP32 Reference — firmware boards, flash safety, WiFi provisioning, disconnect recovery
Device Reference — dashboard device types, transport protocols, broadcast architecture
Protocol — state machine, WebSocket messages, project structure
Voice Setup — microphone + speech-recognition permissions (Apple on-device SFSpeech — no install needed)
Wake Word Detection — Porcupine (Mac) + microWakeWord (ESP32) setup
Testing Guide — test structure, coverage, CI pipeline, writing tests
Why APME — 감에서 데이터로 — motivation, category-aware evaluation strategy, composite score design
APME Pipeline (8-layer deep dive) — ingestion (hook/timeline/PTY), collector→store, classifier, runner, tuner, HTTP/WS, device rendering with file:line anchors
Agent Performance Evaluation Reference — session dataset, category-specific rubrics, turn-level mid-session eval, daemon HTTP API, settings
Creature Simulator Demo — live creature rendering playground (GitHub Pages)

What is AgentDeck?

A control surface — like an audio mixing console, but for AI coding agents. It reads your agent's state in real-time and dynamically reconfigures buttons and encoders to match what's happening right now.

Respond instantly — YES / NO / ALWAYS buttons appear with semantic colors for permission prompts
Interrupt — STOP button sends Ctrl+C to a runaway agent
Switch modes — cycle Plan / Accept Edits / Default
Navigate options — encoder scrolls and selects multi-choice prompts on a wide-canvas LCD
Voice input — push-to-talk → Apple SFSpeech (on-device) → auto-send (offline, zero setup)
Voice assistant — wake word detection → Apple SFSpeech STT → LLM → AVSpeech TTS response (fully offline)
Display sync — macOS sleep dims all connected surfaces; wake restores instantly
Terminal postit — agent state shown in iTerm2 tab titles and badges
Monitor usage — animated water-gauge dashboard with rate limit countdowns
Quick actions — GO ON / REVIEW / COMMIT / CLEAR; encoder cycles custom prompts
System utilities — volume, mic, media, timer from the Utility encoder
Terminal sessions — iTerm dial switches sessions, auto-attaches tmux
Multiple coding agents — Claude Code, Codex CLI, OpenCode, and OpenClaw in one multi-agent daemon view
Works from anywhere — all 13 surfaces can monitor the agent; interactive surfaces (Stream Deck, D200H, Android, Apple) can also control it

The bridge is transparent: if it's off, Claude Code works exactly as before.

Supported Agents

Agent	Status
Claude Code	Supported (primary)
Codex CLI	Supported — PTY parser, model detection, dashboard integration
OpenCode	Supported — PTY + SSE hybrid bridge, timeline integration
OpenClaw	Experimental — Gateway WebSocket, timeline panel, log stream

Supported Surfaces — 13 Types

#	Surface	Description
1	Stream Deck+	Primary — 8 keys, 4 encoders, LCD touch strip (v4 session-per-button)
2	Ulanzi D200H Deck Dock	14-key HID controller + 960×540 LCD — multi-session agent controller, usage monitor, premium CoreGraphics widgets
3	Android Tablet	Color terrarium + HUD overlay (60fps)
4	E-ink Reader	B&W 16-level grayscale + Color E-ink (Kaleido 3, 4096 colors) + partial refresh
5	iPhone	SwiftUI app — mobile agent monitoring
6	iPad	SwiftUI app — terrarium second screen
7	macOS	SwiftUI app — desktop monitoring window + in-process Swift daemon
8	ESP32 Round AMOLED	1.8" circular 466×466 — compact WiFi display
9	ESP32 IPS LCD	3.5" rectangular 480×320
10	ESP32 B86 Box	4" wall-mount touch panel 480×480
11	Ulanzi TC001	8×32 RGB LED matrix — compact HUD pages and creature sprites
12	Pixoo64 LED	64×64 RGB LED pixel art terrarium
13	TUI Terminal	Unicode braille terrarium + ANSI dashboard — SSH/remote

Left: iPad + iPhone (SwiftUI) | Right: ESP32 3 types + Pixoo64 LED

Left: Dual E-ink — B&W (Crema S) + Color (Pantone 6) + ESP32 AMOLED | Right: TUI terminal dashboard

Architecture

                              ┌── Daemon (port 9120, sole hub) ──┐
Stream Deck Plugin ◄── WS ──►│                                   │
D200H Deck Dock    ◄ USB HID►│                                   │
Android Dashboard  ◄── WS ──►│  WS Server + mDNS + Device Mods   │
Apple Dashboard    ◄── WS ──►│  Gateway Proxy + Usage Relay      │
TUI Dashboard      ◄── WS ──►│  Pixoo + ESP32 Serial + SSE       │
ESP32 Display      ◄ Serial ►│                                   │
Pixoo64 LED        ◄ HTTP ──►└───────────────┬───────────────────┘
                                             │ aggregates
                              ┌── Session Bridge (port 9121+) ──┐
User's Terminal ◄─ stdio ───►│  PTY Manager → claude CLI         │
Claude Code Hooks ─ HTTP ───►│  Output Parser → State Machine    │
                              │  Hook Server + Voice (SFSpeech)   │
                              └──────────────────────────────────┘

The daemon is the sole hub for all dashboard clients. Session bridges handle PTY + hooks only. The daemon aggregates state from all sessions and broadcasts to all 13 surfaces. Local clients are auto-trusted; LAN clients authenticate with a token stored in the AgentDeck data directory (~/.agentdeck/auth-token for Node CLI / unsigned dev builds, ~/Library/Group Containers/group.bound.serendipity.agentdeck.dashboard/auth-token for the Mac App Store build — routed through AgentDeckPaths.swift). Interactive surfaces (Stream Deck, D200H, Android, Apple) can control the agent; monitoring surfaces (Pixoo, TUI, ESP32) display state.

On macOS, the AgentDeck Dashboard SwiftUI app ships with a full in-process Swift daemon (47 files, ~20,500 LOC) that re-implements the Node.js bridge — mDNS, device modules (ADB/Serial/Pixoo/D200H), Gateway proxy, and WebSocket server. Installing the macOS app gives you the full bridge without Node.js. The agentdeck CLI remains the canonical path for Claude Code / Codex / OpenCode PTY sessions.

Prerequisites

Item	Required	Install
macOS 14+ (Sonoma)	Yes	Windows/Linux not supported
Xcode Command Line Tools	Yes	`xcode-select --install` (node-pty native build)
Node.js >= 22	Yes	`brew install node`
pnpm >= 9	Yes	`npm install -g pnpm`
Python 3	Yes	`brew install python` (display sleep detection)
Elgato Stream Deck app >= 6.7	Yes	Elgato Downloads
Stream Deck+ hardware	Yes	8 keys + 4 encoders + LCD touch strip
iTerm2	Yes	Terminal management, voice paste, session switching
Claude Code CLI	Yes	`npm install -g @anthropic-ai/claude-code`
JDK 17+	For Android	`brew install openjdk@17`
Stream Deck CLI	Auto	Installed by `pnpm setup` if missing
Microphone + Speech Recognition	For voice	Grant on first use (macOS Settings → Privacy). No sox, whisper, or model download — Apple SFSpeech on-device

Quick Start

# Option A: npm install (no clone needed)
npx @agentdeck/setup

# Option B: from source
git clone https://github.com/puritysb/AgentDeck.git && cd AgentDeck && pnpm setup

The pnpm setup command:

Checks required dependencies (Node.js 22+, pnpm, Claude CLI, Stream Deck app)
Installs @elgato/cli if missing
Runs pnpm install + pnpm build
Generates icon assets (16 PNGs)
Installs Claude Code hooks
Links the Stream Deck plugin
Links the agentdeck CLI globally
(Voice is built-in via Apple SFSpeech on-device — no extra install)

After setup, restart the Stream Deck app, then run:

agentdeck claude

You're steering.

Manual Build & Install

Build

cd AgentDeck
pnpm install
pnpm build            # shared → bridge, plugin, hooks
pnpm generate-icons   # SVG → PNG (required on first build)

1. Install Claude Code Hooks

Node CLI install (dev + Homebrew distribution):

node hooks/dist/install.js

Registers 7 hooks in ~/.claude/settings.local.json: SessionStart, SessionEnd, PreToolUse, PostToolUse, Stop, Notification, UserPromptSubmit. Each hook POSTs JSON to the bridge. Remove with node hooks/dist/install.js uninstall.

Mac App Store install: hooks are opt-in — the app shows a Settings → Claude Code Hooks pane with an "Enable Claude Code Hooks…" button that presents an NSAlert explaining what will be written, then an NSOpenPanel so the user explicitly selects ~/.claude/settings.local.json. Only after that consent does AgentDeck write the hook entries (via a security-scoped bookmark). "Remove" in the same pane cleanly unregisters and revokes the bookmark. No command line required.

2. Link Stream Deck Plugin

cd plugin && streamdeck link .sdPlugin

Creates a symlink in ~/Library/Application Support/com.elgato.StreamDeck/Plugins/. Restart the Stream Deck app to load.

3. Link `agentdeck` CLI

cd bridge && pnpm link --global

4. Voice Setup (Zero install)

Voice input uses Apple's on-device SFSpeechRecognizer (Speech framework). No sox, no whisper.cpp, no model downloads — the OS manages the dictation model via Settings → General → Keyboard → Dictation, which AgentDeck piggybacks on. The only user action is granting Microphone + Speech Recognition permission the first time the voice button is pressed (macOS shows the standard TCC prompts backed by NSMicrophoneUsageDescription and NSSpeechRecognitionUsageDescription).

All audio stays on-device (requiresOnDeviceRecognition = true), so the captured WAV — which may contain project/code names — never leaves the machine. See Voice Setup Guide for permission troubleshooting and wake-word details.

Usage

Start

agentdeck claude   # or: agentdeck codex

This spawns Claude Code or Codex CLI inside a PTY and starts a session bridge on a dynamic port (HTTP + hooks). Your terminal works exactly as before — the Stream Deck adds a parallel control channel. The daemon (port 9120, 0.0.0.0) aggregates all sessions for external clients.

Security: The daemon binds to 0.0.0.0 for LAN access (multi-surface monitoring). Local connections bypass authentication. Remote connections require the auth token from the AgentDeck data directory (~/.agentdeck/auth-token on Node CLI builds, ~/Library/Group Containers/group.bound.serendipity.agentdeck.dashboard/auth-token on Mac App Store).

CLI Reference

The CLI command is agentdeck.

Sessions

Command	Description
`agentdeck claude`	Start Claude Code session (PTY + bridge)
`agentdeck codex`	Start Codex CLI session (PTY + bridge)
`agentdeck opencode`	Start OpenCode session (PTY + SSE bridge)
`agentdeck monitor`	Hook-only bridge (no PTY — run `claude` separately)

Flags: -p <port>, -c <command>, -d (debug), --no-update-check Module flags: --local (all off), --no-mdns, --no-adb, --no-serial, --no-pixoo

Daemon

Command	Description
`agentdeck daemon start`	Start monitoring daemon
`agentdeck daemon stop`	Stop daemon
`agentdeck daemon restart`	Restart daemon
`agentdeck daemon status`	Show daemon status
`agentdeck daemon install`	Register macOS LaunchAgent (auto-start)
`agentdeck daemon uninstall`	Remove LaunchAgent

Session Management

Command	Description
`agentdeck status`	All sessions + daemon status
`agentdeck stop`	Stop a session (`-a` for all, `-p` for specific port)

Monitoring

Command	Description
`agentdeck dashboard`	TUI monitoring dashboard (alias: `dash`)
`agentdeck devices`	Connected devices (WS, ESP32, Pixoo, ADB)
`agentdeck qr`	Pairing QR code + URL
`agentdeck diag`	Diagnostic dump (`-a` for AI analysis)

Evaluation (APME)

Command	Description
`agentdeck apme runs`	List recent runs (filter by `--agent`, `--model`, `--limit`)
`agentdeck apme run <id>`	Detailed run view — steps, turns, per-turn evals, vibe
`agentdeck apme judge`	Evaluate pending runs manually (no daemon required)
`agentdeck apme scorecard`	Model scorecard by category and overall
`agentdeck apme tune`	Trigger rubric auto-tuner (OPRO loop)
`agentdeck apme vibe <runId> <verdict>`	Label a run (`approve`/`reject`/`neutral`)
`agentdeck apme tag <runId> <category>`	Manually set task category
`agentdeck apme reclassify`	Re-run classifier on unclassified runs
`agentdeck apme rubric`	Inspect current rubrics
`agentdeck apme export`	Export dataset to JSON

Device Setup

Command	Description
`agentdeck pixoo scan`	Discover Pixoo devices on LAN
`agentdeck pixoo add <ip>`	Add a Pixoo device
`agentdeck pixoo list`	List configured devices
`agentdeck pixoo remove <ip>`	Remove a device
`agentdeck pixoo test [ip]`	Send test pattern
`agentdeck wifi-setup`	ESP32 WiFi provisioning (serial)

Stream Deck+ Layout (v4)

v4 is session-per-button: all 8 keypad slots use a single session-slot action. List view shows one session per button; pressing a session enters a detail view with options/presets/ESC.

Keypad — List View (8 sessions)

┌─────────┬─────────┬─────────┬─────────┐
│ SESS 1  │ SESS 2  │ SESS 3  │ SESS 4  │
├─────────┼─────────┼─────────┼─────────┤
│ SESS 5  │ SESS 6  │ SESS 7  │  NEXT   │
└─────────┴─────────┴─────────┴─────────┘

OpenClaw sits first, then coding agents (Claude / Codex / OpenCode) ordered by port. Slot 7 paginates when there are 8+ sessions. When no daemon is running, slot 0 becomes ▶ START and launches the macOS AgentDeck Dashboard app.

Keypad — Detail View (press a session)

Slot	Content
0	BACK — return to list
1	Session Info — project + model + state + agent watermark
2-3	Content — permission options, prompt presets, OpenClaw presets (STATUS / MODEL / GATEWAY)
4	ESC / STOP — always visible; bright when active, dimmed when idle
5-6	Content — more options/presets
7	NEXT — paginate when 5+ options

Encoders — 4 Slots

Encoder	Action	Rotate	Push	Touch
E1	Utility	Adjust value (volume, mic, timer)	Toggle / Action	Switch mode
E2	Action	Scroll options / cycle prompts	Send prompt / Confirm	Same as push
E3	Usage	Cycle pages (overview / 5h / 7d / session / extra)	Refresh usage data	Next page
E4	Voice	Scroll transcription text	Hold = record, tap (<500ms) = cancel, VT push = send/paste	—

Left: Voice transcription (Korean) on wide-canvas LCD | Right: Model selection with encoder option list

Dynamic Button States

Detail-view content slots (2-3, 5-6) reconfigure based on agent state — permission prompts get semantic colors (green=approve, red=deny, blue=permanent), options get teal/green, and 5+ options collapse into encoder wide-canvas mode or paginate via NEXT.

See v4 Layout for the full v4 session-per-button model, OpenClaw presets, and v3→v4 migration notes, or Stream Deck+ Layout Reference for per-state ASCII diagrams, color tables, encoder details, and button label intelligence.

Ulanzi D200H Deck Dock

A 14-key USB HID controller with a 960×540 LCD — a second hardware surface that complements the Stream Deck+ with richer session visuals and a dedicated usage monitor.

Left to right: Stream Deck+, Ulanzi D200H Deck Dock (lit session slots), Ulanzi TC001 LED matrix

Layout

13 session buttons — one per session (same OpenClaw-first ordering as Stream Deck+), with agent-colored watermarks, project name, and state indicator baked into each key image
1 big merged usage button (col3+col4, row2) — dedicated usage monitor with live 5h/7d gauges
Press feedback — a bright press-flash on every keydown for tactile confirmation even without force touch

D200H session keys lit in a full desk view — 13 sessions rendered side-by-side with the Stream Deck+ keypad

Transport

HID over USB — VID 0x2207, PID 0x0019. The Node.js daemon and the macOS in-process Swift daemon both drive the device; whichever is running claims it. Sandboxed macOS builds require the USB device entitlement and Input Monitoring permission.
Heartbeat resilience — the HID session auto-recovers from hub resets and sleep/wake cycles.
Shared renderers — session slot imagery is generated from shared SVG renderers in @agentdeck/shared, so Stream Deck+, D200H, Android, and Apple surfaces stay visually consistent.

The D200H is not required — it layers on top of a running daemon. Plug it in and it shows up in agentdeck devices; nothing else to configure.

Android Dashboard

Monitor and control your AI agents from any Android device — no Stream Deck required.

Tablet mode — color terrarium with multi-session octopi, crayfish, neon tetra, and HUD overlay

E-ink mode — Crema S grayscale dashboard and Pantone 6 Kaleido 3 color dashboard

The Android app connects to the same bridge server over your local network, giving you a second screen for agent monitoring and a full mirror of the Stream Deck controls.

Two Display Modes

E-ink mode (Crema S, Onyx, Kobo)

Aquarium-centered B&W dashboard — pixel art creatures in a 16-level grayscale terrarium
Partial refresh zones: A2 (200ms) for fast UI, DU for status, FULL (500ms) for the aquarium
Left panel (22%): agent list with state indicators
Right panel (78%): aquarium + rate limits/models + event timeline

Tablet mode (Lenovo, general Android tablets)

Full-color terrarium background with 60fps creature animation
Semi-transparent HUD panels overlay agent status, rate limits, timeline
Identical information to e-ink, expressed through color and motion

Three-Tab Navigation

Tab	Content
Dashboard	Terrarium background + HUD overlay panels. Connection overlay when disconnected (mDNS discovery, QR pairing)
Deck	Full Stream Deck+ mirror — 4 encoder panels (swipe/tap/long-press gestures) + 2x4 button grid with context area
Settings	Bridge connection, display preferences

Connect to Bridge

The app finds your bridge automatically:

mDNS — the bridge advertises _agentdeck._tcp on your local network; the app discovers it within seconds
QR pairing — run agentdeck qr on your Mac, scan with the app's camera (CameraX + ML Kit)
Manual — enter the bridge IP and port in Settings

Once connected, the app receives real-time state updates over WebSocket and can send commands back to the bridge.

See Android Reference for device support, build/signing instructions, and creature behavior details.

Apple Dashboard

Monitor and control your AI agents from iPhone, iPad, or Mac — a native SwiftUI experience.

The Apple app is a SwiftUI multiplatform app that connects to the bridge on iOS/iPadOS, and on macOS ships with a full in-process Swift daemon (47 files, ~20,500 LOC) — mDNS, device modules (ADB/Serial/Pixoo/D200H), Gateway proxy, and WebSocket server — so the macOS build works standalone without Node.js. You can still use the agentdeck CLI alongside it for Claude Code / Codex / OpenCode PTY sessions; the app's daemon auto-detects and defers to a running CLI daemon on the same port.

Three-Tab Navigation

Tab	Content
Monitor	Terrarium background + HUD overlay — agent status, rate limits, timeline
Deck	Stream Deck+ mirror — encoder panels + button grid with touch gestures
Settings	Bridge connection, display preferences

macOS Menu Bar Popup

On macOS the app always lives in the menu bar — one click reveals the full topology without taking over a window.

Sessions — live session list with agent, state, and model; per-row Jump To buttons open the Pixoo view, the Stream Deck Code view, the full Dashboard window, or reveal the session data folder in Finder
Topology — UPSTREAM (Claude Code hooks, OpenClaw Gateway, MLX, Ollama) and DOWNSTREAM (D200H, Pixoo, ESP32, Android) with LED dots driven by the shared ProviderRailEvaluator, so Settings / Dashboard / menu bar can never disagree about "is Claude connected?"
Tabs — Launch · Dashboard · Evaluation switch the popup body without opening a window; a Start at Login toggle + Quit sit in the footer

Connect to Bridge

mDNS — automatic discovery of _agentdeck._tcp services on your local network
QR pairing — scan with the in-app camera (agentdeck qr on your Mac)
Manual — enter bridge IP and port

iOS Foreground Recovery

The app handles iOS background/foreground transitions gracefully — WebSocket reconnects immediately on foregrounding, state syncs within milliseconds, and the terrarium animation resumes without flicker.

App Store Distribution

The macOS and iOS builds are ready for App Store submission. The macOS build ships as a self-contained Swift daemon gated by the AGENTDECK_APP_STORE compile flag — no bundled Node.js, no bundled adb, no subprocess spawn, no AppleScript. User data lives in the App Group container (~/Library/Group Containers/group.bound.serendipity.agentdeck.dashboard/, not ~/.agentdeck/) per Apple Review Guideline 2.5.2. USB entitlements for D200H HID and Input Monitoring permission handling are wired; the first-launch onboarding asks for Claude Code hook access via explicit NSOpenPanel consent. OpenClaw integration uses the Gateway-native WebSocket path (see §OpenClaw Gateway below), not a file-based identity. TestFlight builds available now; App Store review is the active milestone.

OpenClaw Gateway (Gateway-native pairing)

AgentDeck connects to the OpenClaw Gateway over WebSocket (ws://127.0.0.1:18789) as an operator client. Pairing uses two handshake formats so both distributions work:

Mac App Store build (v3 self-generated identity): on first launch AgentDeck generates an Ed25519 keypair and stores the private key in the macOS Keychain (accessibleAfterFirstUnlockThisDeviceOnly). deviceId = sha256(raw 32-byte public key).hex. The Gateway returns a deviceToken in hello-ok.auth.deviceToken; it's persisted and reused on reconnect. No file read of ~/.openclaw/identity/ — sandbox-safe per Apple 2.5.2. Default scopes: operator.read, operator.write, operator.approvals. To pair, run openclaw devices approve <requestId> (or use OpenClaw's Web UI) once; the dashboard flips from Pairing required → Connected.
CLI / Homebrew build (v2 file-based identity): reads ~/.openclaw/identity/device.json created by openclaw pair, for historical parity with the Node bridge.

Both builds share the same RPC surface: connect, health, models.list, logs.tail, sessions.list/subscribe, chat.send/abort, exec.approval.resolve, system-presence. Events: connect.challenge, health, sessions.changed, session.message/tool, chat, exec.approval.requested/resolved, presence, tick, shutdown. All subprocess-based fallbacks (openclaw doctor, openclaw logs --follow, openclaw models list) are compile-out in the App Store build and replaced with Gateway RPCs. See docs/gateway-protocol.md for wire format + parity fixtures.

Auth failures (missing_token, pairing_required, token_mismatch, device_auth_invalid) do not auto-loop; the Settings pairing-state UI surfaces each case so the user can resolve deliberately. Remote / TLS-pinned Gateway support is deferred to v2; v1 officially targets loopback.

TUI Dashboard

Monitor your agents directly in the terminal — no additional hardware or apps required.

agentdeck dashboard     # or: agentdeck dash

The TUI connects to a running Bridge or Daemon over WebSocket and renders a real-time monitoring interface using raw ANSI escape codes. Zero additional dependencies.

Braille terrarium — octopus, crayfish, neon tetra schools in a truecolor water gradient. State-driven: idle = floor, processing = swimming with starburst, awaiting = "?" bubble
Adaptive layout — wide (120+ cols), standard (80-119), narrow (60-79). Terrarium hides when too small
Status + Timeline — rate limit gauges, OAuth/Ollama models, tool calls, activity density bar
Auto-discovery — finds Daemon or active session automatically; 3s auto-reconnect
SSH friendly — works over any SSH connection with truecolor support; pipes output JSON

Agent Performance Evaluation & Model Orchestration (APME)

The problem: I route 6+ LLMs (Claude Opus/Sonnet, Codex/GPT-5.4, Gemini Antigravity, GLM-5.1, Qwen3.5-30B MLX) across my daily work by gut feeling — "this task to Codex, summaries to local Qwen, important stuff to Opus." I have no idea if that's efficient. Generic benchmarks don't answer it either — they measure average users on standard tasks, not me on my codebase.

APME is the personalized evaluation system that fixes this. Every agent session → local SQLite dataset → category-aware auto-evaluation → eventually data-driven model routing.

Category-aware Evaluation Strategy

The key architectural decision: evaluation method differs per task category. A single "overall score" across all sessions is meaningless — a model that's great at debugging may be verbose in conversation, and vice versa.

Categories	Timing	Layers
`coding` · `refactoring` · `debugging`	Run-level (after session ends)	Deterministic (lint/build/test) + LLM judge with category rubric
`conversation` · `planning` · `research` · `review`	Turn-level / Task-level (immediately after each turn, or after a task boundary like `/clear`)	LLM judge only — no git diff needed
`ops` · `multi_agent` · `unknown`	Run-level	Deterministic + general rubric fallback

Seven dedicated rubrics — conversation scores accuracy · helpfulness · conciseness, debugging scores diagnosis · fix_quality · verification, and so on. Same model can be great at debugging and average at conversation — that's normal, and the scorecard surfaces it.

Three Ingestion Paths → One Schema

Each agent emits events in a different shape. APME converges three paths onto a single ApmeCollector API:

  Claude Code   ──▶  hook HTTP POST + PTY ⏺ tail parser
  OpenClaw/OC   ──▶  adapter timeline events
  Codex CLI     ──▶  PTY parser (spinner_stop + tail)
                │
                ▼
           ApmeCollector  →  {data dir}/apme.sqlite   # ~/.agentdeck/ (CLI)  |  ~/Library/Group Containers/group.bound.serendipity.agentdeck.dashboard/ (App Store)

Claude Code's Stop hook fires only ~18% of the time in v2.1.104, so the primary response-capture path is PTY ring buffer parsing at the ⏺ marker, with a 3-path fallback for the spinner_stop vs UserPromptSubmit race.

Composite Score (4-dimensional weighted sum)

A single judge score is unreliable. APME combines four independent signals so one bad signal can't poison the run:

composite = 0.40 × outcome      ← did it actually finish? (git diff / response captured)
          + 0.40 × judge         ← LLM quality score
          + 0.15 × efficiency    ← tokens/cost/time per change
          + 0.05 × vibe          ← user approve/reject

Coding outcomes are git-based (committed → 1.0, iterated → 0.6, exploratory → 0.5, abandoned → 0.2, …). Non-coding outcomes are response-based — a research session that changes no files isn't "abandoned" if the answer was delivered.

Vibe-first, Judge Second

Counterintuitively, APME doesn't try to make the LLM judge great upfront. Instead:

Collect everything (Stage 1 ✅)
Classify into 10 categories (Stage 2 ✅)
Human labels run-by-run via dashboard 👍/👎 — this is ground truth (Stage 3 🔧)
Auto-tune the judge rubric against human labels via OPRO (Stage 4 🧪)

Without vibe data there's no way to verify the judge is right. So humans label first, and the judge evolves toward the human baseline. The tuner picks up disagreement samples (tests_pass=1 ∧ judge<0.5, vibe=reject ∧ judge>0.8, etc.), proposes new rubrics, shadow-scores them, and only accepts rubrics that improve the vibe correlation.

Local-only Judge Backends (zero cost, full coverage)

Judge runs on local backends only so sampleRate: 1.0 (evaluate everything) is the default — no cost anxiety, no guilt-driven sampling cuts.

Backend	Endpoint	Role
`mlx`	`127.0.0.1:8800` (Qwen3.5-30B)	Primary (Node CLI/Brew)
`foundationModels`	in-process Swift daemon	Primary (App Store macOS build via Apple Intelligence)
`openclaw`	`127.0.0.1:18789` (Gateway)	Secondary

Self-healing Daemon Loop + Device Broadcast

Evaluation runs in the daemon, decoupled from session lifetime. Every 30s: enqueue unevaluated runs, compute outcomes on runs closed >10s ago, re-classify orphans, tag crashed sessions. Even if a session process dies mid-evaluation, the daemon loop eventually completes it.

When an evaluation finishes, a ★ eval_result timeline entry broadcasts to every device simultaneously — Stream Deck (amber ★), Apple (ledAmber EVAL row), Android (LEDAmber tag), ESP32 (@ prefix in TLToolReq), TUI dashboard. The result lands in your peripheral vision seconds after the run ends — which is the UX hook that actually gets you to press 👍/👎.

Dashboard & CLI

Daemon auto-polls every 30s; or trigger manually:

agentdeck apme judge    # evaluate all pending runs manually (no daemon required)

Browse the local web dashboard:

http://localhost:9120/apme

Run table (session · model · task · composite score · cost · git delta), category scorecard (v_category_scorecard — which model wins per category), per-run vibe controls, turn-level mid-session eval cards for non-coding sessions.

Cost policy: Judge runs on local backends only (MLX + OpenClaw Gateway). sampleRate: 1.0 is the default — evaluate every run without worrying about API bills. All session data stays on-device.

Deep dives:

Why APME — the motivation, category-aware strategy, design decisions
APME Pipeline — 8-layer pipeline with file:line anchors
APME Reference — schema, HTTP API, settings, test coverage

ESP32 Display

Compact WiFi-connected displays for always-on agent monitoring.

Left: Round AMOLED close-up (1.8" circular) | Right: Round AMOLED + IPS LCD side by side

Supported Boards

Board	Screen	Resolution
Round AMOLED	1.8" circular AMOLED	466×466
IPS LCD	3.5" rectangular IPS	480×320
B86 Box	4" wall-mount touch panel	480×480
Ulanzi TC001	8×32 WS2812B RGB LED matrix	256 pixels

Setup

Run agentdeck wifi-setup to provision WiFi over serial (see CLI Reference). Once provisioned, the ESP32 connects to the bridge over WiFi WebSocket and displays a compact terrarium with agent status. PlatformIO firmware in esp32/.

Ulanzi TC001 LED Matrix

Compact 8×32 RGB LED matrix for always-on status-at-a-glance monitoring.

The TC001 is a minimal always-on display — separate from the larger ESP32 touch panels. It connects to the bridge via WiFi WebSocket and displays:

Agent state animation — idle, processing, awaiting option states reflected in creature movement
Rate limit gauges — visual countdown for token limits and API quotas
Usage HUD — compact numeric readout (cost, tokens, time)
Multi-agent support — pixel-based creature sprites for Claude Code, Codex, OpenCode, OpenClaw

Simple desk or shelf mounting; low power consumption via USB. Firmware in esp32/ with board support for TC001 (8×32 WS2812B addressable RGB LEDs).

Pixoo64 LED Matrix

64×64 RGB LED pixel art terrarium on a Divoom Pixoo64.

The Pixoo module renders dot-art creatures, water zone colors reflecting agent state, and a compact usage HUD — all pushed over HTTP to the device's local API.

Manage devices with agentdeck pixoo {scan|add|list|remove|test} — see CLI Reference.

Note: The Pixoo's built-in HTTP server can crash under frequent requests. AgentDeck throttles updates automatically. Use --no-pixoo to disable if needed.

Project Structure

AgentDeck/
├── shared/        # Shared TypeScript types + SVG renderers (protocol, states, session slots)
├── bridge/        # Node.js bridge server (PTY, hooks, WS, voice, adapters, D200H HID, evaluation system)
├── plugin/        # Stream Deck SDK v2 plugin (v4 session-slot action, encoders, renderers)
├── hooks/         # Claude Code hook installer
├── setup/         # npm setup package (@agentdeck/setup)
├── android/       # Jetpack Compose dashboard (e-ink + tablet, terrarium, deck mirror)
├── apple/         # SwiftUI multiplatform app (iOS/iPad/macOS) + in-process Swift daemon
├── esp32/         # ESP32 firmware (Round AMOLED / IPS LCD / B86 Box / TC001, PlatformIO)
├── tools/         # Creature simulator (GitHub Pages /demo/), dev tooling
├── config/        # Prompt templates + default settings
├── scripts/       # Install, uninstall, package, icon generation, demo build
└── docs/          # Documentation (architecture, daemon, v4-layout, android, esp32, …)

See Protocol & Architecture for the full file tree, state machine diagram, and WebSocket message reference.

Configuration

Quick Action Buttons

The four Quick Action buttons (slots 3-6) are configurable via the Stream Deck Property Inspector. Defaults:

Slot	Label	Action
3	GO ON	`continue` (sends prompt to continue)
4	REVIEW	`/review`
5	COMMIT	`/commit`
6	CLEAR	`/clear`

Slot 3 also shows START when disconnected (spawns a new agentdeck claude session).

Prompt Templates

Edit config/prompt-templates.json to customize the prompts cycled by the Action encoder (E2) rotate:

{
  "templates": [
    { "label": "Fix Bug", "prompt": "Please fix the bug described above" },
    { "label": "Test", "prompt": "Write tests for the changes made" },
    { "label": "Review", "prompt": "Review the code for issues and suggest improvements" },
    { "label": "Explain", "prompt": "Explain how this code works step by step" }
  ]
}

Troubleshooting

Symptom	Cause	Fix
Plugin shows DISCONNECTED	Bridge not running	Run `agentdeck claude`
Plugin reconnects every 3s	Bridge crashed	Restart `agentdeck claude`
Bridge enters disconnected state	Claude process exited	Restart `agentdeck claude`
State tracking not working	Hook server unreachable	Verify `agentdeck` is running
Stream Deck buttons inactive	Hardware not connected	Reconnect + restart app
Stuck in PROCESSING > 5 min	Agent stalled	STOP button or Ctrl+C in terminal
Voice transcription returns empty	Speech recognition permission denied, or OS dictation model still downloading	macOS Settings → Privacy & Security → Speech Recognition → enable AgentDeck. First-time recognition may wait ~30s while the OS finishes the on-device model download
Plugin not in Stream Deck app	Plugin not linked	Restart Stream Deck app, then `cd plugin && streamdeck link .sdPlugin`
Hooks not firing	Hooks not installed or stale	`node hooks/dist/install.js` (re-installs all 7 hooks)
Need to remove hooks	Uninstalling AgentDeck	`node hooks/dist/install.js uninstall`
Plugin loads but buttons blank	Plugin needs rebuild	`pnpm build && pnpm generate-icons`, restart Stream Deck app
Android app can't find bridge	mDNS blocked on network	Use QR pairing (`agentdeck qr`) or enter IP manually in Settings
Android shows "Not Connected"	Bridge not reachable	Verify same LAN; for USB: `adb reverse tcp:9120 tcp:9120` then connect to 127.0.0.1:9120
E-ink ghosting on Crema	Missing full GC16 refresh	State transitions trigger full refresh automatically; force refresh by toggling bridge connection
`posix_spawnp failed`	Prebuilt node-pty binary incompatible with Node version	`cd $(npm root -g)/@agentdeck/bridge/node_modules/node-pty && npx node-gyp rebuild`

tmux -CC Compatibility

When using iTerm2's tmux -CC (control mode): run agentdeck claude inside a tmux window. The bridge manages its own PTY, so there's no conflict.

Signal chain: tmux → iTerm2 → agentdeck → bridge PTY → claude

Uninstall

bash scripts/uninstall.sh

Removes Claude Code hooks, unlinks agentdeck CLI, and removes the Stream Deck plugin symlink. Restart the Stream Deck app afterward.

Development

pnpm -r --parallel dev    # Watch mode for all packages
cd plugin && pnpm build   # Rebuild plugin only
cd bridge && pnpm build   # Rebuild bridge only
pnpm -r typecheck         # Type check without building

Testing

pnpm test                        # Run root Vitest suite (bridge, plugin, shared, hooks)
pnpm test -- --watch             # Watch mode
pnpm vitest run --coverage       # Coverage report (v8) + threshold check
pnpm test:report                 # Unified report script (Vitest + Android + Apple + Robot)
pnpm test:android                # Android suite via unified report script

The repository currently uses 4 test frameworks. The default pnpm test path only runs the root Vitest suite; Android, Apple, and ESP32 suites are executed through scripts/test-report.sh.

GitHub Pages publishes the current test dashboard at https://puritysb.github.io/AgentDeck/reports/ with suite status, scenario coverage mapping, and coverage trends.

Framework	Scope	Current inventory	Notes
Vitest	`bridge`, `plugin`, `shared`, `hooks`	42 `.test.ts` files	Root `pnpm test` and CI path
JUnit + Robolectric	Android unit tests	4 Kotlin test files	Run via `./gradlew testDebugUnitTest` or `pnpm test:report`
XCTest	Apple app tests	10 Swift test files	Run via `xcodebuild test` or `pnpm test:report`
Robot Framework	ESP32 validation	4 Robot suites	Hardware-oriented, run via report script

Coverage thresholds currently enforced by vitest.config.ts are lines ≥17%, functions ≥15%, branches ≥14%, statements ≥16%.

The current GitHub Actions CI workflow runs on every push/PR to master and executes build → typecheck → vitest → vitest coverage on ubuntu-latest with Node 20. Android, Apple, and ESP32 suites are not part of the default CI job.

See Testing Guide for full details on coverage, writing tests, and running the unified report.

Quick smoke test after changes:

pnpm build && pnpm test && agentdeck status

Packaging

Build a distributable .streamDeckPlugin file:

pnpm package    # → dist/bound.serendipity.agentdeck.streamDeckPlugin

Recipients double-click to install. The bridge (agentdeck) and Claude Code CLI must be installed separately.

Published npm packages: @agentdeck/shared, @agentdeck/bridge, @agentdeck/setup.

Debugging

Bridge logs print to the agentdeck terminal:

[agentdeck] Starting AgentDeck bridge on port 9120...
[agentdeck] Hook server listening on port 9120
[agentdeck] WebSocket server ready on port 9120
[agentdeck] Spawned: claude
[WsServer] Plugin connected
[StateMachine] DISCONNECTED -> idle (trigger: session_start, source: hook)

Stream Deck plugin logs: Stream Deck app → Settings → Logs.

Next Milestones — Current Focus

AgentDeck is actively working on two critical areas to prepare for production release:

1. App Store Distribution (macOS + iOS)

The SwiftUI dashboard is ready for App Store submission. The macOS app ships a full in-process Swift daemon (47 files, ~20,500 LOC) — mDNS discovery, device modules (ADB/Serial/Pixoo/D200H HID), OpenClaw Gateway WebSocket client, HTTP + WebSocket server. App Store compliance is gated by the AGENTDECK_APP_STORE compile flag: no bundled Node.js / adb / D200H helper, no subprocess spawn, no AppleScript (per Apple Review Guideline 2.5.2). User data lives in ~/Library/Group Containers/group.bound.serendipity.agentdeck.dashboard/ (routed through AgentDeckPaths.swift; never hand-write the path). USB device entitlements cover D200H HID; Input Monitoring is handled with the standard TCC prompt. OpenClaw integration uses Gateway-native pairing (self-generated Ed25519 identity in Keychain + Gateway-issued device token) — no file read of ~/.openclaw/identity/. apple/scripts/verify-appstore-archive.sh is wired into CI and asserts these invariants on every archive.

2. Personalized Agent Evaluation System (APME)

Building a data-driven answer to "which of my 6+ LLMs should I route this task to?" — replacing gut-feel model selection with measurement on my actual work. All three ingestion paths (Claude Code hooks + PTY, OpenClaw/OpenCode timeline events, Codex PTY parser) converge on a unified ApmeCollector → local SQLite. Category-aware evaluation:

Coding (coding/refactoring/debugging) — run-level eval after session ends, deterministic layer (lint/build/test) + LLM judge with category-specific rubrics
Non-coding (conversation/planning/research/review) — turn-level mid-session eval, fires immediately after each turn completes, no git diff needed
Composite score — 4-dimensional weighted sum (0.40 outcome + 0.40 judge + 0.15 efficiency + 0.05 vibe) so a single noisy signal can't poison the run

Judge is local-only (MLX Qwen3.5-30B primary, OpenClaw Gateway secondary) so sampleRate: 1.0 is the default — every session evaluated, zero cost. Auto-tuning via OPRO loop picks up disagreement between human vibe labels and judge scores, proposes new rubrics, and shadow-scores them before accepting. The Model Recommender reads v_category_scorecard to suggest the best model per category + budget.

Eval results broadcast to every device simultaneously (Stream Deck/Apple/Android/ESP32/TUI) via the ★ eval_result timeline entry — pulling labeling into peripheral vision instead of burying it in a dashboard nobody opens.

Current bottleneck: not the infrastructure (complete), but accumulating enough vibe-labeled data to unlock Stage 4 auto-tuning.

Roadmap

Achieved

In Progress

App Store distribution (macOS + iOS — sandbox hardening, TestFlight available)
APME vibe-labeling accumulation → Stage 4 OPRO rubric auto-tuner activation (needs ≥30 disagreement samples)

Planned

Play Store distribution (Android app)
Stream Deck Marketplace registration (plugin distribution)

AgentDeck — Physical Control Surface for AI Coding Agents

Name		Name	Last commit message	Last commit date
Latest commit History 636 Commits
.agents/workflows		.agents/workflows
.github/workflows		.github/workflows
android		android
apple		apple
assets		assets
bridge		bridge
config		config
docs		docs
esp32		esp32
generated/protocol		generated/protocol
hooks		hooks
plugin		plugin
scripts		scripts
setup		setup
shared		shared
tests/parity/gateway-frames		tests/parity/gateway-frames
tools/creature-simulator		tools/creature-simulator
zkswe		zkswe
.gitignore		.gitignore
.pnpm-approvedBuilds.json		.pnpm-approvedBuilds.json
.prettierignore		.prettierignore
.prettierrc		.prettierrc
AGENTS.md		AGENTS.md
ATTRIBUTION.md		ATTRIBUTION.md
CLAUDE.md		CLAUDE.md
DEVELOPMENT_LOG.md		DEVELOPMENT_LOG.md
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
test-d200h.ts		test-d200h.ts
tsconfig.base.json		tsconfig.base.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

AgentDeck

Table of Contents

Documentation

What is AgentDeck?

Supported Agents

Supported Surfaces — 13 Types

Architecture

Prerequisites

Quick Start

Manual Build & Install

Build

1. Install Claude Code Hooks

2. Link Stream Deck Plugin

3. Link agentdeck CLI

4. Voice Setup (Zero install)

Usage

Start

CLI Reference

Sessions

Daemon

Session Management

Monitoring

Evaluation (APME)

Device Setup

Stream Deck+ Layout (v4)

Keypad — List View (8 sessions)

Keypad — Detail View (press a session)

Encoders — 4 Slots

Dynamic Button States

Ulanzi D200H Deck Dock

Layout

Transport

Android Dashboard

Two Display Modes

Three-Tab Navigation

Connect to Bridge

Apple Dashboard

Three-Tab Navigation

macOS Menu Bar Popup

Connect to Bridge

iOS Foreground Recovery

App Store Distribution

OpenClaw Gateway (Gateway-native pairing)

TUI Dashboard

Agent Performance Evaluation & Model Orchestration (APME)

Category-aware Evaluation Strategy

Three Ingestion Paths → One Schema

Composite Score (4-dimensional weighted sum)

Vibe-first, Judge Second

Local-only Judge Backends (zero cost, full coverage)

Self-healing Daemon Loop + Device Broadcast

Dashboard & CLI

ESP32 Display

Supported Boards

Setup

Ulanzi TC001 LED Matrix

Pixoo64 LED Matrix

Project Structure

Configuration

Quick Action Buttons

Prompt Templates

Troubleshooting

tmux -CC Compatibility

Uninstall

Development

Testing

Packaging

Debugging

Next Milestones — Current Focus

1. App Store Distribution (macOS + iOS)

2. Personalized Agent Evaluation System (APME)

Roadmap

Achieved

In Progress

Planned

About

3. Link `agentdeck` CLI

Packages