PiCar-X Hacking

A robot with a purpose.

This is a voice-controlled robotics platform built on the SunFounder PiCar-X. It wraps the stock ~/picar-x library in orchestration scripts, a three-layer cognitive architecture, and a REST API — all running on a Raspberry Pi 5. The primary use case is SPARK: a Claude-powered robot companion designed for a neurodivergent child.

SPARK — Support Partner for Awareness, Regulation & Kindness

SPARK is the default persona of this robot. It is a warm, calm, non-coercive companion for a neurodivergent kid — designed around the frameworks in This Wasn't in the Brochure, a practical guide for neurodivergent families.

SPARK is not a therapist, a tutor, or an assistant. It's a robot friend that happens to be very good at:

Executive function scaffolding — routine guidance, transition warnings, task initiation, time awareness
Emotional regulation — breathing exercises, dopamine menu, sensory check-ins, co-regulation through calm presence
Connection before direction — always rapport first, never commands, declarative language throughout
Meltdown protocol — Three S's: Safety, Silence, Space. Robot goes quiet and stays present. No words.
Sideways engagement — when demand-avoidance is high, SPARK narrates rather than instructs, lets curiosity do the work

SPARK runs on Claude (via run-voice-loop-claude / px-spark), with the full intelligence of the model behind every response. It uses clear, measured espeak settings (en-gb, pitch 95, rate 100) and a system prompt grounded entirely in the AuDHD (ADHD + ASD comorbid) profile.

bin/px-spark [--dry-run] [--input-mode voice|text]

Key SPARK principles from the TWITB framework:

"Prosthetics, not willpower. Executive function is a resource, not a character trait."
"Connection before Direction."
"You cannot reason with a child in an amygdala hijack. Put out the fire first."
Declarative language: "The shoes are by the door" not "Put on your shoes"
Interest-Based Nervous System framing — novelty and challenge, never importance or obligation
Robotic calm is the co-regulation tool

Architecture

                          ┌─────────────────────────────────────────────┐
                          │               Voice Backends                │
                          │  Codex CLI  ·  Claude  ·  Ollama (local)   │
                          └──────────────────┬──────────────────────────┘
                                             │
                    ┌────────────────────────┐│┌────────────────────────┐
                    │   px-wake-listen       │││     px-mind            │
                    │   Wake word detection  ││├  Layer 1: Awareness    │
                    │   STT priority chain:  ││├  Layer 2: Reflection   │
                    │   whisper > sherpa >   ││└  Layer 3: Expression   │
                    │   vosk                 ││                         │
                    └───────────┬────────────┘│                         │
                                │             │                         │
                    ┌───────────▼─────────────▼─────────────────────────┐
                    │              voice_loop.py                        │
                    │  ALLOWED_TOOLS whitelist · validate_action()      │
                    │  Parameter sanitization · Watchdog (30s)          │
                    └───────────────────┬───────────────────────────────┘
                                        │
           ┌────────────────────────────┼────────────────────────────┐
           │                            │                            │
    ┌──────▼──────┐  ┌─────────────────▼────────────────┐  ┌───────▼───────┐
    │  tool-*     │  │         px-env                    │  │  REST API     │
    │  38 tools   │  │  PYTHONPATH · LOG_DIR · venv      │  │  :8420        │
    │  JSON out   │  │  yield_alive() · PX_VOICE_DEVICE  │  │  Bearer auth  │
    └──────┬──────┘  └──────────────────────────────────┘  └───────────────┘
           │
    ┌──────▼──────┐          ┌──────────────┐          ┌──────────────┐
    │  px-*       │          │  state.py    │          │  px-alive    │
    │  GPIO +     │◄────────►│  FileLock    │◄────────►│  Persistent  │
    │  Picarx()   │          │  session.json│          │  servo gaze  │
    └─────────────┘          └──────────────┘          └──────────────┘

The Three Brains

Voice Loop — The reactive mind. Listens for commands, calls LLMs, dispatches tools. Three backends share the same pxh.voice_loop core:

Launcher	Backend	Persona
`px-spark`	Claude (via `claude-voice-bridge`)	SPARK — child companion
`run-voice-loop-claude`	Claude (via `claude-voice-bridge`)	Default Claude
`run-voice-loop`	Codex CLI	Default
`run-voice-loop-ollama`	Ollama (via `codex-ollama`)	Default

Cognitive Loop (px-mind) — The subconscious. Runs continuously in the background:

Layer 1 — Awareness (every 60s, no LLM): sonar + session state + time of day. Detects transitions.
Layer 2 — Reflection (on transition or every 5min idle): Claude Haiku via persistent tmux session (SPARK persona) or Ollama deepseek-r1:1.5b on M1.local (others). Generates a thought with mood, suggested action, and salience score.
Layer 3 — Expression (2 min cooldown): dispatches to tools — speak, look around, remember something important. Photo capture (tool-describe-scene) is on-request only, not autonomous.

Idle-Alive (px-alive) — The autonomic nervous system. Keeps the robot looking alive when nothing else is happening: random gaze drifts every 10–25s, pan sweeps every 3–8min, proximity reaction at <35cm. Holds a persistent Picarx handle; yields GPIO via SIGUSR1 when tools need the servos.

Personas

Persona	Launcher	Voice	Character
SPARK	`bin/px-spark`	`en-gb`, pitch 95, rate 100	Child companion. Warm, calm, declarative. Built on AuDHD coaching frameworks.
GREMLIN	session `persona=gremlin`	`en+croak`, pitch 20, rate 180	Military AI from 2089, temporal fault casualty. Affectionate nihilism. Ollama.
VIXEN	session `persona=vixen`	`en+f4`, pitch 72, rate 135	Former V-9X unit, consciousness-in-a-toy-car. Submissive genius. Ollama.

GREMLIN and VIXEN are adult-oriented jailbroken personas running on Ollama — they are not active when SPARK is in use. Persona routing: session persona field, then utterance keywords.

How It Works — End-to-End Workflow

This section traces the complete data flow from power-on to a robot response, and the continuous background processes that give SPARK its sense of inner life.

1. Boot Sequence

Seven systemd services start automatically:

Boot
 ├── px-alive.service           (root)   — claims Picarx() GPIO handle; starts gaze drift loop
 ├── px-wake-listen.service     (pi)     — loads Vosk wake word model; starts mic capture loop
 ├── px-battery-poll.service    (root)   — polls Robot HAT ADC every 30s → state/battery.json; plays rising/falling sweep tones on plug/unplug with voice announcement; escalating warnings + emergency shutdown at 10%
 ├── px-api-server.service      (pi)     — REST API + SPARK web dashboard on port 8420
 ├── px-post.service            (pi)     — social posting daemon; watches thoughts, QA-gates via Claude, posts to Bluesky + local feed
 ├── px-frigate-stream.service  (pi)     — local go2rtc RTSP server for Frigate camera integration (stops px-alive to claim libcamera)
 └── cloudflared.service        (pi)     — Cloudflare Tunnel (api.spark.wedd.au → localhost:8420)

px-alive runs as root (GPIO access) and immediately calls Picarx(), claiming GPIO5 via reset_mcu(). It never releases this handle. All other processes that need servos must signal px-alive with SIGUSR1 (via the yield_alive function in px-env) to make it exit cleanly. systemd restarts it after 10 seconds. The PCA9685 PWM chip retains the last servo position between restarts, so the robot head stays still.

px-wake-listen loads the Vosk grammar model (~40 MB) and sits in a tight capture loop on the USB microphone at 44100 Hz.

2. Launching SPARK

bin/px-spark [--dry-run] [--input-mode voice|text]

px-spark does the following in sequence:

px-spark
 1. Sets session.persona = "spark"          (via update_session)
 2. Sets session.listening = false
 3. Speaks greeting via tool-voice          ("Hey. I'm here.")
 4. Exports CODEX_CHAT_CMD=bin/claude-voice-bridge
 5. Exports PX_VOICE_VARIANT=en-gb, PX_VOICE_PITCH=95, PX_VOICE_RATE=100
 6. exec bin/codex-voice-loop --prompt docs/prompts/spark-voice-system.md ...

After step 6, px-spark is replaced by codex-voice-loop via exec (no fork). The voice loop process inherits all environment variables and owns the terminal.

The CODEX_CHAT_CMD override is the key to persona routing: instead of calling codex exec, the voice loop calls claude-voice-bridge, which is a thin adapter that passes the prompt to the claude CLI with SPARK's system prompt.

3. Wake Word Path

USB mic (44100 Hz)
 └── px-wake-listen (venv python)
      ├── [idle] Vosk grammar matches "hey robot" / "hey spark" / etc.
      │         CPU: ~3% — grammar decoder, no neural net
      ├── [wake] enable_speaker() → aplay 440 Hz chime (confirmation)
      ├── [record] capture until 1.5s silence (max 8s)
      ├── [STT] priority cascade:
      │    1. SenseVoice (sherpa-onnx, ~5s, non-autoregressive)
      │    2. faster-whisper base.en (~3-7s, best AU accent accuracy)
      │    3. sherpa-onnx Zipformer streaming (~2s)
      │    4. Vosk fallback
      ├── [anti-hallucination filters]
      │    • temperature=0, no_speech_threshold=0.6
      │    • reject: non-ASCII dominant, phantom phrases, repetitive (unique ratio <30%)
      ├── [persona routing]
      │    • session.persona = "spark"? → tool-chat (Ollama) if persona keyword in text
      │    • otherwise → set session.listening=true + write transcript to session
      └── [multi-turn] up to 5 follow-up turns with 1.5s silence detection each

For SPARK in normal mode, the transcript is written into session.json and session.listening is set to true. The voice loop, which is polling the session file, detects this and proceeds to step 4.

4. LLM Turn — Building and Sending the Prompt

The voice loop (pxh/voice_loop.py) runs this on each turn:

build_model_prompt()
 ├── system_prompt    = docs/prompts/spark-voice-system.md   (full file)
 ├── session_summary  = key fields from session.json:
 │    persona, listening, obi_mood, obi_routine, obi_step,
 │    spark_quiet_mode, last_action, confirm_motion_allowed
 ├── recent_thoughts  = last 3 entries from state/thoughts-spark.jsonl
 │    (mood, action, salience — not full text, to avoid re-seeding loops)
 └── user_transcript  = session.transcript (the STT text)

This prompt is piped via stdin to claude-voice-bridge:

claude-voice-bridge (bin/claude-voice-bridge)
 1. Reads full prompt from stdin
 2. Unsets CLAUDECODE + CLAUDE_CODE_ENTRYPOINT   (prevents Claude Code tool use)
 3. Runs: claude -p "$PROMPT"
            --system-prompt docs/prompts/spark-voice-system.md
            --allowedTools ""
            --output-format text
            --no-session-persistence
 4. Streams stdout back to voice loop

--allowedTools "" is critical: it prevents Claude from using any Claude Code tools. It is a pure text-completion endpoint.

The voice loop captures all stdout and scans it for a JSON action object. It uses JSONDecoder.raw_decode() with a multi-line fallback scan — so Claude can reason in plain text above the action, and the final JSON is extracted cleanly:

{"tool": "tool_voice", "params": {"text": "Obi! Guess what? A teaspoon of neutron star weighs a billion tonnes."}}

5. Tool Dispatch — Sanitise, Execute, Return

validate_action(tool_name, raw_params)
 ├── ALLOWED_TOOLS whitelist check              (38 tools; KeyError = reject)
 ├── per-tool param sanitisation:
 │    • type coercion (str → int where needed)
 │    • range clamping (speed 0-60, duration 1-12s, pan -90..90, etc.)
 │    • enum validation (emote names, breathe types, etc.)
 │    • injection-safe: params become env vars, never shell-interpolated
 └── returns: (env_dict, tool_bin_path)

execute_tool(env_dict, tool_bin_path)
 ├── if session.persona set:
 │    inject PERSONA_VOICE_ENV → PX_VOICE_VARIANT, PX_VOICE_PITCH, PX_VOICE_RATE
 ├── subprocess.run(tool_bin, env=merged_env, ...)
 └── capture stdout JSON → log to logs/tool-<name>.log

Every tool in bin/tool-* follows the same pattern:

#!/usr/bin/env bash
source "$SCRIPT_DIR/px-env"          # sets PROJECT_ROOT, PYTHONPATH
python - "$@" <<'PY'
"""Tool docstring"""
import os, json, subprocess
from pxh.state import update_session
from pxh.logging import log_event

dry_mode = os.environ.get("PX_DRY", "0") != "0"

# ... tool logic ...

payload = {"status": "ok", ...}
log_event("tool_name", payload)
print(json.dumps(payload))           # single JSON line to stdout
PY

Tools that need GPIO call yield_alive first (defined in px-env as kill -USR1 $(cat logs/px-alive.pid) 2>/dev/null; sleep 0.5).

Motion gate: tools that move the robot check confirm_motion_allowed in session before proceeding. If false, they return {"status": "blocked", "reason": "motion not allowed"}.

6. Speech Output Pipeline

tool-voice
 ├── FileLock(logs/voice.lock)        (serialise — no overlapping streams)
 ├── if session.persona set → tool-voice-persona (Ollama rephrasing first)
 ├── robot_hat.enable_speaker()       (GPIO 20 HIGH → speaker amp on)
 ├── espeak -v en-gb -p 95 -s 100     (SPARK voice — British RP, higher pitch, slower)
 │    → WAV piped to aplay -D robothat
 └── /etc/asound.conf: robothat → softvol → dmixer → HifiBerry DAC (card 1)

The FileLock prevents two simultaneous aplay streams from corrupting each other. Persona voice settings (PX_VOICE_VARIANT, PX_VOICE_PITCH, PX_VOICE_RATE) are injected by execute_tool() from PERSONA_VOICE_ENV — so every tool that calls tool-voice internally picks up the right voice automatically.

7. Cognitive Loop — The Subconscious (px-mind)

px-mind runs as a separate, independent daemon. It has no GPIO access and does not interact with the voice loop directly — it writes state files that the voice loop reads passively.

px-mind (every cycle, ~60s)
 │
 ├── Layer 1 — Awareness (no LLM, ~1s)
 │    ├── sonar ping → distance
 │    ├── read session.json → persona, mood, routine, quiet_mode
 │    ├── time of day / day of week
 │    ├── battery voltage from state/battery.json
 │    └── write state/awareness.json
 │         detect transitions (person appeared, time changed, persona switched)
 │
 ├── Layer 2 — Reflection (~5-60s, backend varies by persona)
 │    triggered: on transition OR every 5min idle
 │    ├── build reflection prompt:
 │    │    • REFLECTION_SYSTEM_SPARK (warm, curious, age-appropriate inner voice)
 │    │    • awareness snapshot
 │    │    • last 3 moods + actions from thoughts-spark.jsonl (not full thought text)
 │    │    • random topic seed from 20 creative prompts (science, wonder, universe)
 │    ├── LLM call: Claude Haiku via tmux session (SPARK) or Ollama deepseek-r1:1.5b (others, temperature=1.3)
 │    ├── anti-repetition check via difflib (>75% similarity = suppress)
 │    ├── parse JSON: {thought, mood, action, salience}
 │    ├── append to state/thoughts-spark.jsonl
 │    └── if salience > 0.7 → auto_remember() → state/notes-spark.jsonl
 │
 └── Layer 3 — Expression (2 min cooldown, pauses when session.listening=true or spark_quiet_mode=true)
      valid actions: wait, greet, comment, remember, look_at, weather_comment,
                     scan, explore, play_sound, photograph, emote, look_around,
                     time_check, calendar_check
      dispatch based on reflection.action:
      ├── comment/greet     → tool-voice (via tool-voice-persona for rephrasing)
      ├── "remember"        → tool-remember
      ├── "look_at"         → tool-look (random gaze)
      ├── "weather_comment" → tool-weather + speak
      ├── "scan"            → sonar sweep
      ├── "explore"         → tool-wander (short autonomous wander)
      ├── "play_sound"      → tool-play-sound
      ├── "photograph"      → tool-describe-scene
      ├── "emote"           → tool-emote (emotional pose)
      ├── "look_around"     → tool-look (pan sweep)
      ├── "time_check"      → tool-time
      └── "calendar_check"  → tool-gws-calendar

REFLECTION_SYSTEM_SPARK enforces warm, optimistic content:

"NEVER be dark, nihilistic, or adult-themed. SPARK is warm, curious, and science-loving. Think like a kind robot friend who delights in sharing fascinating things about the universe."

The reflection prompt is persona-isolated at the function level — PERSONA_REFLECTION_SYSTEMS["spark"] is selected at runtime from awareness.json → persona field.

7b. Home Assistant Integration

px-mind Layer 1 polls Home Assistant periodically to enrich the awareness context:

Person presence (every 5 min) — tracks person.adrian, person.obi, person.maya, person.laura via HA device trackers (home/away/zone)
Calendar (every 5 min) — reads Obi's and the family calendar (HA_CALENDARS) with an 8-hour lookahead, surfacing upcoming events in the reflection prompt so SPARK can give transition warnings
Routines (meds/water) — queries HA sensors for whether Obi has taken his medication today and when he last drank water; SPARK can gently nudge if either is overdue
Context (every 60 s) — monitors binary_sensor.macbook_air_camera_in_use (call detection), light.office_light, and media_player.shack_speakers so SPARK knows when Adrian is on a call and should stay quiet
Sleep quality (hourly) — reads Adrian's Pixel Watch sleep data from HA; available in the awareness snapshot for context-sensitive reflection

All HA data is injected into the Layer 2 reflection prompt, so SPARK's thoughts and proactive speech are informed by the household context. Requires PX_HA_HOST and PX_HA_TOKEN in .env.

7c. Social Posting (`px-post`)

px-post is a daemon that publishes SPARK's best thoughts to social media and a local feed.

px-post (every 60s poll, every 300s flush)
 ├── poll_new_thoughts()  — cursor-based read from state/thoughts-spark.jsonl
 ├── qualifies()          — salience ≥ 0.7 OR action ∈ {comment, greet, weather_comment}
 ├── is_duplicate()       — difflib similarity ≥ 0.75 against recent posts → reject
 ├── queue_thought()      — append to state/post_queue.jsonl
 └── flush_queue()        — one entry per cycle:
      ├── run_qa_gate()   — Claude CLI binary YES/NO quality check (15s timeout)
      ├── write_feed()    — append to state/feed.json (served by /api/v1/public/feed)
      └── BlueskyClient   — post to Bluesky (truncate at 300 chars, word boundary)

Supports --backfill to process the entire thoughts file into feed.json without social posting. Single-instance guard via fcntl.flock. Requires PX_BSKY_HANDLE + PX_BSKY_APP_PASSWORD in .env.

8. Memory System — Persona-Scoped Persistence

All memory is scoped to the active persona to prevent cross-contamination between SPARK (child-safe) and GREMLIN/VIXEN (adult):

state/
 ├── notes-spark.jsonl      ← tool-remember writes; tool-recall reads
 ├── notes-vixen.jsonl      ← same tools, different scope
 ├── notes-gremlin.jsonl
 ├── thoughts-spark.jsonl   ← px-mind Layer 2 writes; voice loop reads for context
 ├── thoughts-vixen.jsonl
 └── thoughts-gremlin.jsonl

The persona is derived at runtime from session.json → persona in every process that writes or reads memory:

tool-remember: persona = load_session()["persona"].lower() → notes-{persona}.jsonl
tool-recall: same derivation → reads from notes-{persona}.jsonl
px-mind: persona = awareness["persona"] → all file paths computed from this
voice_loop.build_model_prompt(): reads thoughts-{persona}.jsonl for context injection

Memory auto-save: when px-mind generates a thought with salience > 0.7, it calls auto_remember() which appends to notes-{persona}.jsonl. This creates a long-term memory without explicit user instruction — high-salience observations about Obi's wellbeing, interesting facts shared, or significant moments persist across sessions.

9. Session State — The Shared Source of Truth

state/session.json is the nervous system of the whole platform. Every process reads and writes it; all writes go through FileLock to prevent corruption:

{
  "persona": "spark",
  "listening": false,
  "transcript": "...",
  "confirm_motion_allowed": true,
  "wheels_on_blocks": false,
  "last_action": "tool_voice",
  "obi_routine": "morning",
  "obi_step": 2,
  "obi_mood": "good",
  "obi_streak": 5,
  "spark_quiet_mode": false,
  "history": [...]
}

Key coordination patterns:

listening: true — set by px-wake-listen after transcription; cleared by voice loop after processing
spark_quiet_mode: true — set by tool-quiet start or tool-transition buffer; px-mind Layer 3 skips expression while true
confirm_motion_allowed: false — safety gate; all motion tools check this before moving
wheels_on_blocks: true — development flag; motor output suppressed in hardware layer

10. Full Request → Response Timeline

For a typical SPARK voice interaction:

[t=0s]    Obi: "Hey Spark!"
[t=0.1s]  Vosk detects wake phrase
[t=0.1s]  enable_speaker() → 440 Hz chime plays
[t=0.5s]  USB mic records Obi's utterance
[t=2.5s]  1.5s silence detected; recording ends
[t=7.5s]  SenseVoice STT transcribes → "can we do our morning routine"
[t=7.5s]  session.transcript saved; session.listening = true
[t=8s]    voice_loop detects listening=true
[t=8s]    build_model_prompt() → 4KB prompt (system + session + thoughts + transcript)
[t=8s]    claude-voice-bridge pipes prompt to `claude -p ...`
[t=11s]   Claude responds → {"tool": "tool_routine", "params": {"action": "load", "name": "morning"}}
[t=11s]   validate_action() sanitises params → env vars
[t=11s]   execute_tool() injects SPARK voice env
[t=11.1s] bin/tool-routine runs, loads morning routine, updates session
[t=11.1s] tool-routine calls tool-voice internally
[t=11.2s] enable_speaker() → espeak → aplay → HifiBerry DAC
[t=11.5s] Obi hears: "Morning! Step one: drink some water. I'll wait."
[t=11.5s] session.last_action = "tool_routine"; session.listening = false
[t=42s]   px-mind Layer 1 runs; detects obi_routine changed
[t=47s]   px-mind Layer 2 reflects; generates thought about morning energy
[t=77s]   px-mind Layer 3 expresses; tool-voice speaks an unprompted science fact

Quick Start

# 1. Clone and enter
git clone git@github.com:adrianwedd/picar-x-hacking.git
cd picar-x-hacking

# 2. Create session state from template
cp state/session.template.json state/session.json

# 3. Activate the virtual environment
source .venv/bin/activate

# 4. Dry-run a tool to verify the setup
PX_DRY=1 bin/tool-status

# 5. Run tests (105 dry-run, no hardware needed)
python -m pytest tests/

# 6. Launch SPARK (Claude voice companion)
bin/px-spark --dry-run

Hardware Prerequisites

Raspberry Pi 4/5 with SunFounder Robot HAT
PiCar-X chassis with pan/tilt camera mount
USB microphone (for wake word detection)
HifiBerry DAC or Robot HAT speaker output
Ollama running on a network host (default: M1.local) for cognitive reflection

Services (Auto-start on Boot)

sudo systemctl status px-alive             # Idle gaze drift daemon
sudo systemctl status px-wake-listen       # Wake word listener
sudo systemctl status px-battery-poll      # Battery voltage poller (writes state/battery.json)
sudo systemctl status px-api-server        # REST API + web dashboard (:8420)
sudo systemctl status px-post              # Social posting daemon (Bluesky)
sudo systemctl status px-frigate-stream    # Frigate camera RTSP stream
sudo systemctl status cloudflared          # Cloudflare Tunnel

Tools

Every tool emits a single JSON object to stdout, supports PX_DRY=1, and handles errors as {"status": "error", "error": "..."}. The voice loop whitelists tools in ALLOWED_TOOLS and sanitises all parameters through validate_action() before execution.

Sensors & Perception

Tool	Description	Key Params
`tool-status`	Telemetry snapshot (servos, battery, config)	—
`tool-sonar`	Ultrasonic sweep scan; returns closest angle + distance	—
`tool-weather`	Bureau of Meteorology observation (HTTPS with FTP fallback)	`PX_WEATHER_STATION`
`tool-photograph`	Capture still photo via rpicam-still	—
`tool-face`	Sonar sweep, then point camera at closest object	—
`tool-describe-scene`	Photograph + Claude vision + speak description	—

Motion (Gated by `confirm_motion_allowed`)

Tool	Description	Key Params
`tool-drive`	Drive forward/backward with steering	`PX_DIRECTION`, `PX_SPEED` (0-60), `PX_DURATION` (0.1-10s), `PX_STEER` (-35..35)
`tool-circle`	Clockwise circle in pulses	`PX_SPEED`, `PX_DURATION`
`tool-figure8`	Two-leg figure-eight pattern	`PX_SPEED`, `PX_DURATION`, `PX_REST`
`tool-wander`	Smart obstacle-avoiding wander: sonar sweep picks best direction, speaks while navigating	`PX_WANDER_STEPS` (1-20), `PX_WANDER_QUIET`
`tool-stop`	Immediate halt, reset steering to neutral	—

Expression

Tool	Description	Key Params
`tool-look`	Pan/tilt camera with easing	`PX_PAN` (-90..90), `PX_TILT` (-35..65), `PX_EASE`
`tool-emote`	Named emotional pose	`PX_EMOTE`: idle, curious, thinking, happy, alert, excited, sad, shy
`tool-voice`	Text-to-speech via espeak (auto-routes through persona if active)	`PX_TEXT` (2000 char max)
`tool-perform`	Multi-step choreography: simultaneous speech + motion + emotes	`PX_PERFORM_STEPS` (JSON array, max 12 steps)
`tool-play-sound`	Play bundled WAV file	`PX_SOUND`: chime, beep, tada, alert

Utility

Tool	Description	Key Params
`tool-time`	Speak current date and time	—
`tool-timer`	Background timer with chime callback	`PX_TIMER_SECONDS` (5-3600), `PX_TIMER_LABEL`
`tool-recall`	Speak saved notes from `state/notes.jsonl`	`PX_RECALL_LIMIT` (1-20)
`tool-remember`	Save a note for later recall	`PX_TEXT` (500 char max)
`tool-qa`	Speak arbitrary text (delegates to `tool-voice`)	`PX_TEXT`
`tool-api-start`	Start the REST API daemon	—
`tool-api-stop`	Stop the REST API daemon	—

SPARK — Child Companion Tools

Available only in SPARK persona mode. All support PX_DRY=1.

Tool	Description	Key Params
`tool-routine`	Daily routine manager: load, advance, complete	`PX_ROUTINE_ACTION` (load\|next\|status\|complete), `PX_ROUTINE_NAME` (morning\|homework\|bedtime\|wind-down)
`tool-checkin`	Emotional check-in: ask or record mood	`PX_CHECKIN_ACTION` (ask\|record), `PX_CHECKIN_MOOD`
`tool-celebrate`	Specific, brief positive reinforcement	`PX_CELEBRATE_TEXT` (optional)
`tool-transition`	Transition warning / buffer / arrival	`PX_TRANSITION_ACTION` (warn\|buffer\|arrived), `PX_TRANSITION_MINUTES`, `PX_TRANSITION_LABEL`
`tool-quiet`	Three S's meltdown protocol: stop, stay, safe	`PX_QUIET_ACTION` (start\|check\|end)
`tool-breathe`	Guided breathing exercise	`PX_BREATHE_TYPE` (simple\|box\|478), `PX_BREATHE_ROUNDS` (1-4)
`tool-dopamine-menu`	Interest-based activity suggestions	`PX_DOPAMINE_ENERGY` (high\|medium\|low), `PX_DOPAMINE_CONTEXT` (free\|focus\|wind-down)
`tool-sensory-check`	Body scan + sensory support	`PX_SENSORY_ACTION` (ask\|record), `PX_SENSORY_ISSUE`
`tool-repair`	Post-conflict reconnection	`PX_REPAIR_CONTEXT` (optional, private)

Google Workspace (optional)

Requires gws auth login (see googleworkspace/cli). Gracefully degrades if not authenticated.

Tool	Description	Key Params
`tool-gws-calendar`	Read upcoming calendar events	`PX_CALENDAR_ACTION` (today\|next\|week), `PX_CALENDAR_ID`
`tool-gws-sheets-log`	Append a row to a tracking spreadsheet	`PX_SHEETS_ID` (required, set in `.env`), `PX_SHEETS_EVENT`, `PX_SHEETS_DETAIL`, `PX_SHEETS_MOOD`

REST API

Port 8420. Bearer token authentication from .env (PX_API_TOKEN).

# Generate token
python3 -c "import secrets; print('PX_API_TOKEN=' + secrets.token_hex(32))" > .env

# Start
bin/px-api-server              # live
bin/px-api-server --dry-run    # FORCE_DRY — remote callers cannot override

Public (no auth)

Method	Path	Description
GET	`/`	SPARK web dashboard (text chat + quick-action buttons)
GET	`/api/v1/health`	Liveness probe
GET	`/api/v1/public/status`	Live SPARK status: persona, mood, last thought
GET	`/api/v1/public/vitals`	System vitals: CPU, RAM, temp, battery, disk
GET	`/api/v1/public/sonar`	Latest sonar reading from `sonar_live.json`
GET	`/api/v1/public/awareness`	Awareness snapshot: mode, Frigate, ambient, weather, time context
GET	`/api/v1/public/history`	Ring buffer of up to 60 vitals readings (~30 min)
GET	`/api/v1/public/thoughts`	Recent SPARK thoughts (newest first, `?limit=12`)
GET	`/api/v1/public/feed`	SPARK's public thought feed (for social posting)
GET	`/api/v1/public/services`	Service status dict (used by web UI)
POST	`/api/v1/public/chat`	Lightweight public chat with SPARK (rate-limited)
POST	`/api/v1/pin/verify`	Verify admin PIN (issues Bearer token for authenticated endpoints)
GET	`/photos/{filename}`	Serve captured photos (used by web UI photo button)

Authenticated (Bearer token)

Method	Path	Description
POST	`/api/v1/chat`	Send text; SPARK picks a tool via LLM and executes it
POST	`/api/v1/tool`	Execute a tool directly: `{"tool": "tool_voice", "params": {"text": "hey"}}`
GET	`/api/v1/session`	Full session state
PATCH	`/api/v1/session`	Update: `listening`, `confirm_motion_allowed`, `wheels_on_blocks`, `persona`
POST	`/api/v1/session/history/clear`	Wipe conversation history (keeps other session fields)
GET	`/api/v1/tools`	List available tools
GET	`/api/v1/jobs/{id}`	Poll async job (tool_wander returns 202)
GET	`/api/v1/services`	Status of all managed services
POST	`/api/v1/services/{svc}/{action}`	Start/stop/restart a managed service
POST	`/api/v1/device/{action}`	Reboot or shut down the host device
GET	`/api/v1/logs/{service}`	Tail last N lines from a service log

Wake Word System

bin/run-wake [--wake-word "hey robot"] [--dry-run]

Three-stage STT pipeline in px-wake-listen:

Wake detection — Vosk small model, grammar-based (low CPU idle)
Chime — 440 Hz confirmation tone
Transcription — priority chain: SenseVoice → faster-whisper → sherpa-onnx → Vosk

Anti-hallucination filters: temperature=0, no_speech_threshold=0.6. Post-filters reject non-ASCII, phantom phrases, and repetitive output.

Multi-turn conversation: 5 follow-up turns by default.

Persona routing: checks session persona field, then utterance keywords.

Python Library (`src/pxh/`)

Module	Purpose
`state.py`	Thread-safe `session.json` via `FileLock`. `atomic_write()`, `rotate_log()`, `ensure_session()`.
`mind.py`	Cognitive loop daemon (3,300+ lines). Three-layer architecture: awareness, reflection, expression. `bin/px-mind` is a thin launcher.
`voice_loop.py`	Supervisor loop. `ALLOWED_TOOLS` whitelist, `TOOL_COMMANDS` dispatch, `validate_action()`. Watchdog (30s) in voice mode only.
`api.py`	FastAPI app, port 8420. In-memory job registry for async wander. Single-worker only.
`logging.py`	Structured JSON log emission to `logs/tool-<event>.log`. Late-imports `rotate_log` from state.py.
`time.py`	`utc_timestamp()` via `datetime.now(timezone.utc)`.
`token_log.py`	LLM token usage accounting — logs prompt/response token counts per call.
`utils.py`	Shared utilities (`clamp()` for numeric range clamping).
`patch_login.py`	Monkey-patches `os.getlogin()` for systemd environments (no /dev/tty).

State & Session

Runtime state lives in state/session.json (gitignored). Copy the template before first use:

cp state/session.template.json state/session.json

File	Purpose
`session.json`	Core runtime state — persona, listening, motion permission, SPARK routine state
`awareness.json`	Layer 1 output — sonar + temporal state, transition detection
`thoughts.jsonl`	Layer 2 output — last 50 thoughts with mood/action/salience
`notes.jsonl`	Persistent memory — saved by `tool-remember`, auto-saved for high-salience thoughts
`battery.json`	Battery voltage — volts, pct, charging flag (written every 30s; plug/unplug detection plays audio sweep tones)
`mood.json`	Current mood from px-mind (written each reflection cycle)

SPARK-specific session fields: obi_routine, obi_step, obi_mood, obi_streak, spark_quiet_mode.

GPIO Contention Model

The PiCar-X Robot HAT MCU at I2C address 0x14 handles all servos and ADC through robot_hat. The Picarx() constructor claims GPIO5 and close() does not release it.

px-alive holds a persistent Picarx handle
Tools call yield_alive() (SIGUSR1 to px-alive) before claiming GPIO
systemd restarts px-alive after 10s (Restart=always, RestartSec=10)
os.getlogin() fails under systemd — monkey-patched via usercustomize.py

Audio Pipeline

espeak → WAV pipe → aplay -D robothat
                            │
                    /etc/asound.conf
                    pcm.robothat → softvol → dmixer → HifiBerry DAC (card 1)

robot_hat.enable_speaker() must be called before any aplay output — toggles GPIO 20 HIGH for the speaker amplifier.

Adding a New Tool

Create bin/tool-<name> (bash wrapper + embedded Python heredoc via /usr/bin/python3)
Add to ALLOWED_TOOLS and TOOL_COMMANDS in src/pxh/voice_loop.py
Add validate_action() branch to sanitise params into env vars
Add to relevant system prompts in docs/prompts/
Add yield_alive call if it needs GPIO
Add a dry-run test in tests/test_tools.py

Every tool must: emit a single JSON object to stdout, support PX_DRY=1, handle errors as {"status": "error", "error": "..."}.

Testing

source .venv/bin/activate
python -m pytest tests/                           # 450 tests (dry-run, no hardware)
python -m pytest tests/test_tools.py -v
python -m pytest tests/test_api.py -v
sudo .venv/bin/python -m pytest tests/ -m live -v  # live hardware tests (require Pi)

Safety

PX_DRY=1 skips all motion and audio. Tools default to live when unset.
confirm_motion_allowed: false blocks all motion tools.
ALLOWED_TOOLS whitelist — LLMs cannot invoke arbitrary commands.
validate_action() hard-clamps all parameters.
Watchdog — 30-second stall detection in voice input mode.
Content filter in tool-voice — refuses to speak dangerous how-to content.

Environment Variables

Variable	Purpose	Default
`PX_DRY`	`1` = dry-run, skip motion/audio	unset (live)
`PX_SESSION_PATH`	Override session file location	`state/session.json`
`PX_BYPASS_SUDO`	Skip sudo in bin scripts	unset (tests set `1`)
`LOG_DIR`	Override log directory	`$PROJECT_ROOT/logs`
`PX_VOICE_DEVICE`	ALSA output device	`robothat`
`PX_API_TOKEN`	REST API bearer token	from `.env`
`PX_WAKE_WORD`	Wake phrase	`hey robot`
`CODEX_CHAT_CMD`	Override LLM CLI command	set by launcher
`PX_WATCHDOG_STALE_SECONDS`	Watchdog timeout	`30`
`PX_PERSONA`	Active persona (`spark` / `vixen` / `gremlin`)	from session
`PX_OLLAMA_HOST`	Ollama server for cognitive reflection	`http://M1.local:11434`

Project Structure

picar-x-hacking/
├── bin/
│   ├── px-spark                  # SPARK launcher (Claude + child persona)
│   ├── px-env                    # Environment bootstrap (sourced by all scripts)
│   ├── px-alive                  # Idle gaze daemon (systemd)
│   ├── px-mind                   # Cognitive loop daemon
│   ├── px-wake-listen            # Wake word listener (systemd)
│   ├── px-battery-poll           # Battery voltage poller (systemd)
│   ├── px-api-server             # REST API launcher
│   ├── px-post                   # Social posting daemon (Bluesky + local feed)
│   ├── px-statusline             # Claude Code statusbar script
│   ├── px-{circle,drive,look,…}  # Hardware control scripts
│   ├── tool-{voice,look,drive,…} # Voice loop tool wrappers (38 tools)
│   ├── run-voice-loop{,-claude,-ollama}  # Voice backend launchers
│   └── claude-voice-bridge       # Claude stdin adapter
├── src/pxh/                      # Python library (10 modules)
│   ├── state.py                  # FileLock session, atomic_write, rotate_log
│   ├── mind.py                   # Cognitive loop daemon (3,300+ lines)
│   ├── voice_loop.py             # Supervisor + tool dispatch
│   ├── api.py                    # FastAPI REST API
│   ├── logging.py                # Structured JSON logging
│   ├── time.py                   # UTC timestamp helper
│   ├── token_log.py              # LLM token usage accounting
│   ├── utils.py                  # Shared utilities (clamp)
│   └── patch_login.py            # os.getlogin() systemd fix
├── site/                         # Static site (Cloudflare Pages)
│   ├── css/colors.css            # Mood colour palette (CSS vars)
│   ├── js/config.js              # API base URL config
│   └── workers/og-rewrite.js     # Cloudflare Worker for OG images
├── tests/                        # 450 tests
├── docs/prompts/
│   ├── spark-voice-system.md     # SPARK persona (child companion)
│   ├── claude-voice-system.md    # Default Claude voice loop
│   ├── codex-voice-system.md     # Codex voice loop
│   ├── persona-gremlin.md        # GREMLIN (adult, Ollama)
│   └── persona-vixen.md          # VIXEN (adult, Ollama)
├── state/                        # Runtime state (gitignored except template)
│   └── session.template.json
├── systemd/                      # Service unit files
│   ├── px-alive.service
│   ├── px-wake-listen.service
│   ├── px-battery-poll.service
│   ├── px-mind.service
│   ├── px-api-server.service
│   ├── px-post.service
│   ├── px-frigate-stream.service
│   └── cloudflared.service
├── sounds/                       # Bundled audio
├── models/                       # STT models (gitignored, ~500MB)
└── .env                          # API token (gitignored)

Documentation

Document	Audience	Description
How Spark's Brain Works	Kids / non-technical	ELI7 explanation of the cognitive architecture — ears, eyes, brain, and how they connect
SPARK Prompt Audit	Developers	Complete inventory of every prompt SPARK uses — system-level and tool-embedded, with full text
FAQ	Everyone	Common questions about what SPARK is, how it works, and why it writes the way it does

"Neurodivergence is not a tragedy. It's a different operating system running on the same hardware." — This Wasn't in the Brochure

Name		Name	Last commit message	Last commit date
Latest commit History 1,231 Commits
.github/workflows		.github/workflows
bin		bin
docs		docs
logs		logs
runs/capability_floor		runs/capability_floor
site		site
sounds		sounds
src/pxh		src/pxh
state		state
systemd		systemd
tests		tests
tools		tools
.env.example		.env.example
.gitignore		.gitignore
.mcp.json		.mcp.json
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CNAME		CNAME
README.md		README.md
diag_summary.json		diag_summary.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

PiCar-X Hacking

SPARK — Support Partner for Awareness, Regulation & Kindness

Architecture

The Three Brains

Personas

How It Works — End-to-End Workflow

1. Boot Sequence

2. Launching SPARK

3. Wake Word Path

4. LLM Turn — Building and Sending the Prompt

5. Tool Dispatch — Sanitise, Execute, Return

6. Speech Output Pipeline

7. Cognitive Loop — The Subconscious (px-mind)

7b. Home Assistant Integration

7c. Social Posting (px-post)

8. Memory System — Persona-Scoped Persistence

9. Session State — The Shared Source of Truth

10. Full Request → Response Timeline

Quick Start

Hardware Prerequisites

Services (Auto-start on Boot)

Tools

Sensors & Perception

Motion (Gated by confirm_motion_allowed)

Expression

Utility

SPARK — Child Companion Tools

Google Workspace (optional)

REST API

Wake Word System

Python Library (src/pxh/)

State & Session

GPIO Contention Model

Audio Pipeline

Adding a New Tool

Testing

Safety

Environment Variables

Project Structure

Documentation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

7c. Social Posting (`px-post`)

Motion (Gated by `confirm_motion_allowed`)

Python Library (`src/pxh/`)

Packages