Clank listens for spoken commands, transcribes them locally with the Moonshine speech-to-text model, parses the intent with a local Ollama LLM, and sends LED control commands to an ESP32 microcontroller over your LAN.
Everything runs on your own hardware — no cloud APIs, no external data sent anywhere.
Clank is a hobby project provided as-is, with no warranty of any kind. You run it entirely at your own risk. Please understand the following before using it:
- Always-on microphone. To detect its wake word, Clank continuously captures and transcribes audio from your microphone. Audio and transcripts are processed locally, in memory, and discarded — by default nothing is sent to the cloud and raw transcripts are not written to disk (only resolved commands are logged; see
security.log_transcripts). Even so, you are running a device that is always listening. - Other people's privacy. Recording or transcribing people without their knowledge or consent may be illegal where you live (one-/two-party consent laws vary by jurisdiction). If others share the space, inform them. Compliance is your responsibility.
- Mains electricity is dangerous. Switching 230/240 V (or 120 V) loads with relays or modified plugs can cause electric shock, fire, or death if done incorrectly. Use only properly rated, fused, and enclosed hardware — or commercial smart plugs — and consult a qualified electrician for any fixed wiring. The authors accept no liability for damage, injury, or loss resulting from use of this project.
- Not a safety device. Do not use Clank to control anything where failure, a misheard command, or downtime could be hazardous (medical equipment, heating, security, etc.).
By using Clank you accept these terms.
- How it works
- Hardware requirements
- System requirements
- Step 1 — Clone and install Python dependencies
- Step 2 — Install and configure Ollama
- Step 3 — Fetch the Moonshine models
- Step 4 — Flash the ESP32 firmware
- Step 5 — Generate an API key
- Step 6 — Run Clank
- Voice commands
- Configuration reference
- Device management
- Logs and monitoring
- Security features
- Model provenance and supply-chain hardening
- Auditing the model with Netron
- Re-auditing and updating models
- Troubleshooting
- Repository layout
- License
Microphone ──► Silero VAD ──► Moonshine STT ──► Ollama LLM ──► ESP32 /led-control
- Voice activity detection (Silero VAD) watches the microphone and fires when speech starts and ends.
- Transcription (Moonshine ONNX, local) converts the audio to text.
- Intent parsing (Ollama, local) maps the text to a structured JSON LED command.
- Dispatch — the validated JSON is POSTed to the ESP32 over HTTPS (TLS) with an API key header. The client pins the device's self-signed certificate, and the ESP32 enforces rate limiting and a constant-time API-key check.
- ESP32 development board (any variant with WiFi — ESP32-DevKitC, WROOM, etc.)
- LEDs wired to GPIO 18 (red), 23 (green), 16 (blue) — adjust pins in the firmware if needed
- A microphone accessible to your host machine (USB or built-in)
| Requirement | Notes |
|---|---|
| Python 3.10+ | 3.12 recommended |
| ~2 GB RAM | Moonshine base model |
| ~250 MB disk | ONNX model weights |
| Linux / macOS | Windows not tested |
| Ollama | Local LLM server |
| Arduino IDE 2.x or arduino-cli | For flashing the ESP32 |
System audio libraries (Linux):
sudo apt-get install libsndfile1 portaudio19-dev python3-venvSystem audio libraries (macOS):
brew install portaudiogit clone https://github.com/cycloarcane/clank.git
cd clank
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt-
Download and install Ollama from ollama.com.
-
Start the server:
ollama serve
-
Pull the recommended model:
ollama pull qwen3:4b
The intent-parsing task is simple (one short sentence → a fixed JSON schema), so a small model is plenty.
qwen3:4b(~2.5 GB at Q4) fits comfortably on a 4 GB GPU, which is the recommended default.Choosing a Qwen3 model for your hardware:
GPU VRAM Suggested model Approx. size ~4 GB qwen3:4b~2.5 GB ≤2 GB / CPU-only qwen3:1.7b~1.4 GB 12 GB+ qwen3:14b(higher accuracy)~9 GB Any model that reliably outputs clean JSON works.
qwen3:14bwill not fit in 4 GB of VRAM — only choose it if you have a larger GPU.
The script downloads weights pinned to a specific audited commit and writes SHA256SUMS:
./scripts/fetch_moonshine.shVerify integrity:
sha256sum -c SHA256SUMS # both lines should print OKDo not skip the verification step. If the hashes do not match, do not run Clank — re-run the fetch script and investigate.
The firmware serves HTTPS and will not compile until two gitignored headers exist: your credentials (secrets.h) and the TLS certificate (cert.h). Do these prerequisites first:
- Create
secrets.hfrom the template and fill in your WiFi credentials and an API key:cp ESP32LEDs/secrets.h.example ESP32LEDs/secrets.h # generate a key for CLANK_API_KEY: python3 -c "import secrets; print(secrets.token_urlsafe(32))" # then edit ESP32LEDs/secrets.h and set WIFI_SSID, WIFI_PASSWORD, CLANK_API_KEY
secrets.his gitignored — your credentials are never committed. A non-emptyCLANK_API_KEYenables authentication; leaving it""disables auth (dev/testing only). - Generate the TLS certificate — writes
ESP32LEDs/cert.h(firmware) andcerts/esp32.crt(pinned by Clank). Pass the device's LAN IP so it lands in the certificate:python3 scripts/generate_esp32_cert.py --ip 192.168.0.18
cert.hcontains the device's private key and is gitignored — never commit it. If the ESP32's IP changes, re-run this and re-flash.
Static IP (optional but recommended). The cert is pinned to one IP, so a stable address is convenient. Either reserve a DHCP lease on your router, or set a static IP in the firmware (see the
WiFi.config(...)call andlocal_IPnear the top ofESP32LEDs.ino).
-
Install Arduino IDE 2.x from arduino.cc/en/software.
-
Add the ESP32 board package. Open File → Preferences, find Additional boards manager URLs, and add:
https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_index.json -
Install the ESP32 core. Open Tools → Board → Boards Manager, search for
esp32by Espressif, and click Install. -
Install the ArduinoJson library. Open Sketch → Include Library → Manage Libraries, search for
ArduinoJsonby Benoit Blanchon, and install it (v6.x or 7.x).WiFi,ESPmDNS, and the TLS server (esp_https_server) all ship with the ESP32 core 3.x — no separate library needed. (Requires core 3.x; the olderesp32_https_serverlibrary is not used.) -
Open the sketch. File → Open → navigate to
ESP32LEDs/ESP32LEDs.ino. -
Select your board and port.
- Tools → Board → esp32 → ESP32 Dev Module (or whichever matches your hardware)
- Tools → Port → select the port your ESP32 is on (e.g.
/dev/ttyUSB0,/dev/ttyACM0, orCOM3)
-
Upload. Click the → Upload button. The IDE compiles and flashes. When done, open Tools → Serial Monitor at 115200 baud — you should see the WiFi connection and IP address printed.
-
Install arduino-cli.
# Linux / macOS (official install script) curl -fsSL https://raw.githubusercontent.com/arduino/arduino-cli/master/install.sh | sh # then add the installed binary to your PATH, e.g.: export PATH="$HOME/bin:$PATH"
-
Add the ESP32 board package URL and update the index.
arduino-cli config init arduino-cli config add board_manager.additional_urls \ https://raw.githubusercontent.com/espressif/arduino-esp32/gh-pages/package_esp32_index.json arduino-cli core update-index
-
Install the ESP32 core (3.x) and ArduinoJson.
arduino-cli core install esp32:esp32 arduino-cli lib install "ArduinoJson" # the TLS server (esp_https_server) ships with the ESP32 core — no extra lib
-
Find your ESP32's port.
arduino-cli board list # Look for a line with "Unknown" or "ESP32" — note the port (e.g. /dev/ttyUSB0) -
Compile and upload.
arduino-cli compile \ --fqbn esp32:esp32:esp32 \ ESP32LEDs/ESP32LEDs.ino arduino-cli upload \ --fqbn esp32:esp32:esp32 \ --port /dev/ttyUSB0 \ ESP32LEDs/ESP32LEDs.ino
Replace
/dev/ttyUSB0with your actual port. -
Confirm the upload.
arduino-cli monitor --port /dev/ttyUSB0 --config baudrate=115200 # Press Ctrl+C to exit the monitorYou should see the device connect to WiFi and print its IP address.
The IP is printed on the serial monitor each boot. You will need it in Step 6. If your router supports it, assign a static DHCP lease to the ESP32's MAC address so the IP never changes.
Generate a cryptographically random key:
python3 -c "import secrets; print(secrets.token_urlsafe(32))"Example output:
K3dRm9vXpQlTnYwZ8uBfA2sGcHeJiOkV1yMqCrNxDtL
Flash the key into the ESP32 by setting API_KEY in ESP32LEDs.ino to this value and re-uploading (Steps 4.5–4.7 / 4.4–4.5 above).
Alternatively, register the device through Clank's device manager, which prints the ready-to-paste C++ line:
python3 register_device.py "Living-Room-LEDs"
# Prints: const char* API_KEY = "K3dRm9vX...";Set environment variables and start:
# Activate the virtual environment if not already active
source .venv/bin/activate
# Required: your ESP32's IP address
export ESP32_IP=192.168.0.18
# Required: the API key you generated in Step 5
export ESP32_API_KEY=K3dRm9vXpQlTnYwZ8uBfA2sGcHeJiOkV1yMqCrNxDtL
# Required for HTTPS: the pinned ESP32 certificate from Step 4
export ESP32_CA_CERT=certs/esp32.crt
# Optional: change the Ollama model (default: qwen3:4b)
export CLANK_LLM_MODEL=qwen3:4b
python3 src/voicecommand/voice_LED_control.pyOr use the startup script, which loads variables from a .env file automatically:
# Create a .env file (never commit this file)
cat > .env <<'EOF'
ESP32_IP=192.168.0.18
ESP32_API_KEY=your-key-here
ESP32_CA_CERT=certs/esp32.crt
CLANK_LLM_MODEL=qwen3:4b
EOF
./start_clank.shClank prints Listening. Press Ctrl+C to quit. when it is ready.
Speak naturally. Clank responds to LED commands addressed to "Computer":
| What you say | What happens |
|---|---|
| "Computer, turn on the red LED" | Red LED on |
| "Computer, turn off the blue light" | Blue LED off |
| "Computer, set green LED to 50%" | Green LED at 50% brightness |
| "Computer, turn on all LEDs" | All three LEDs on |
| "Computer, turn off all lights" | All three LEDs off |
Anything that does not map to an LED command is logged and discarded.
All settings live in config/default.yaml and can be overridden by environment variables.
| Variable | Default | Description |
|---|---|---|
ESP32_IP |
192.168.0.18 |
IP address of the ESP32 |
ESP32_API_KEY |
(none) | Shared API key for ESP32 authentication |
ESP32_CA_CERT |
(none) | Path to the pinned ESP32 certificate (certs/esp32.crt). When set, Clank connects over HTTPS and verifies against it. |
ESP32_USE_HTTPS |
false |
Force HTTPS even without a pinned cert (uses system trust — generally only useful for testing). |
CLANK_LLM_MODEL |
qwen3:4b |
Ollama model name |
CLANK_CONFIG |
config/default.yaml |
Path to config file |
CLANK_LLM_ENDPOINT |
http://127.0.0.1:11434/api/generate |
Ollama API endpoint |
CLANK_LOG_LEVEL |
INFO |
Log level (DEBUG / INFO / WARNING / ERROR) |
If neither ESP32_CA_CERT nor ESP32_USE_HTTPS is set, Clank falls back to plain HTTP — use this only with the legacy non-TLS firmware.
audio:
sampling_rate: 16000
vad_threshold: 0.5 # 0.0–1.0, raise to reduce false triggers
min_silence_duration_ms: 300
max_speech_seconds: 15
llm:
model: "qwen3:4b"
temperature: 0.0
max_tokens: 150
timeout: 30.0
response_format: "json" # force a single valid JSON object
think: false # disable qwen3 "thinking" (set null for non-reasoning models)
network:
use_service_discovery: true # auto-discover ESP32 via mDNS
connection_timeout: 10.0
security:
max_requests_per_minute: 60
enable_audit_logging: trueThe ESP32 serves HTTPS with a self-signed certificate that Clank pins. The certificate is generated in Step 4 and bound to the device's IP. Regenerate it whenever the ESP32's IP changes, then re-flash the firmware (the cert is embedded in cert.h):
python3 scripts/generate_esp32_cert.py --ip <device-ip>
export ESP32_CA_CERT=certs/esp32.crtThis writes ESP32LEDs/cert.h (firmware) and certs/esp32.crt (pinned by Clank). Both are gitignored.
Register a device and get the API key line ready for the firmware:
python3 register_device.py "Kitchen-LEDs"
# Device registered successfully!
# Device ID: device_abc123...
# API Key: K3dRm9vX...
# Add this to your ESP32 configuration:
# const char* API_KEY = "K3dRm9vX...";List registered devices:
python3 - <<'EOF'
from src.voicecommand.auth import AuthManager
auth = AuthManager()
for d in auth.list_devices():
status = "active" if d.is_active else "revoked"
print(f"{d.name} ({d.device_id}) — {status}")
EOF| File | Contents |
|---|---|
logs/clank.log |
Application activity, rotating, 10 MB max |
logs/audit.log |
Security events in JSON format |
# Follow application log
tail -f logs/clank.log
# Watch security events (requires jq)
tail -f logs/audit.log | jq '.'
# Check for authentication failures
grep '"event_type": "auth_failure"' logs/audit.logSensitive fields (API keys, tokens, IP addresses) are automatically redacted in clank.log.
| Feature | Detail |
|---|---|
| Model integrity | SHA256 hashes verified before any model is loaded |
| Commit-locked downloads | Weights pinned to audited commit 2501abf |
| No network calls at runtime | All AI runs locally; no data leaves the machine |
| Input validation | Transcribed text is sanitised and length-bounded before reaching the LLM prompt, preventing prompt injection via crafted audio |
| LLM response validation | JSON output is checked against an allowlist of valid actions, colours, and states before dispatch |
| HTTPS (TLS) transport | The ESP32 serves HTTPS; Clank connects over TLS and pins the device's self-signed certificate (ESP32_CA_CERT) |
| ESP32 authentication | POST /led-control requires a matching X-API-Key header (constant-time comparison); unauthenticated requests get 401 |
| Rate limiting | 60 requests per minute, enforced on the ESP32 before authentication (throttles brute-force); excess requests get 429 |
| mDNS device discovery | ESP32 advertises _clank-led._tcp (with a secure TXT record) so Clank finds it without hardcoded IPs |
| Structured audit logging | Security events written to a separate JSON log with automatic redaction |
| Item | Value |
|---|---|
| Repository | UsefulSensors/moonshine on Hugging Face |
| Pinned commit | 2501abf |
| Encoder | onnx/merged/base/float/encoder_model.onnx (~80 MB) |
| Decoder | onnx/merged/base/float/decoder_model_merged.onnx (~166 MB) |
| Download script | scripts/fetch_moonshine.sh |
| Hash file | SHA256SUMS |
Using …/resolve/2501abf/… in the download URL guarantees every clone receives identical bytes. A silent upstream change can only reach users if we change the pinned commit and publish new checksums — intentional, reviewed, and auditable.
We visually inspected the weights for PAIT-ONNX-200 class architectural back-doors:
pip install netron
netron models/moonshine/encoder_model.onnx &
netron models/moonshine/decoder_model_merged.onnx &
# opens http://localhost:8080- View → Layout → Hierarchical for a tall vertical graph.
- Search (
Ctrl+F) for operators that do not belong in an acoustic model:If,Where,Equal,ArgMax, smallMatMulwith a constant input. - Legitimate paths are hundreds of Conv / GRU blocks. A back-door path is typically fewer than 20 nodes and rejoins just before
Softmax. - Repeat whenever you update the weights.
We found no suspicious parallel branches in commit 2501abf. The hashes in SHA256SUMS reflect this vetted state.
- Create a new branch.
- Update
MOON_COMMITinsidescripts/fetch_moonshine.sh. - Run the script, inspect graphs in Netron, update
SHA256SUMS:sha256sum models/moonshine/encoder_model.onnx \ models/moonshine/decoder_model_merged.onnx > SHA256SUMS - Open a PR summarising what you checked (Netron screenshots welcome).
- Once merged, users re-run the quick-start and stay safe.
"No module named 'sounddevice'"
sudo apt-get install portaudio19-dev # Linux
brew install portaudio # macOS
pip install sounddevice"SHA256 mismatch" on startup
The model file is corrupt or was modified. Re-run ./scripts/fetch_moonshine.sh and verify again with sha256sum -c SHA256SUMS.
"Error sending command to ESP32" / connection refused
- Confirm the ESP32 is powered and connected to WiFi (check serial monitor).
- Confirm
ESP32_IPmatches the address printed on the serial monitor. - Check that
ESP32_API_KEYmatchesAPI_KEYin the firmware exactly.
ESP32 serial monitor shows nothing after flashing
- Make sure the baud rate is set to 115200.
- Press the EN/RST button on the ESP32 to trigger a reboot and print the startup output.
"401 Authentication failed" in Clank logs
The ESP32_API_KEY environment variable and the API_KEY constant in ESP32LEDs.ino do not match. Re-flash the firmware with the correct key.
Ollama connection refused
ollama serve # start the server
ollama list # confirm your model is pulledNo devices found via mDNS discovery
# Linux — check the ESP32 is advertising
avahi-browse -r _clank-led._tcp
# macOS
dns-sd -B _clank-led._tcpIf nothing appears, confirm ESPmDNS is included in the firmware and the ESP32 is on the same subnet.
VAD triggers too easily on background noise
Raise vad_threshold in config/default.yaml (e.g. from 0.5 to 0.7).
clank/
├── README.md ← this file
├── LICENSE
├── requirements.txt ← Python dependencies
├── requirements-secure.txt ← extended dependency list (for development)
├── SHA256SUMS ← model digests
├── install.sh ← one-command automated installer
├── start_clank.sh ← startup script (loads .env, activates venv)
├── register_device.py ← CLI helper to register an ESP32 and get its API key
│
├── config/
│ └── default.yaml ← all tunable settings
│
├── certs/ ← auto-generated TLS certificates (gitignored)
├── logs/ ← application and audit logs (gitignored)
├── models/
│ └── moonshine/ ← ONNX weights (fetched by fetch_moonshine.sh)
│ ├── encoder_model.onnx
│ └── decoder_model_merged.onnx
│
├── scripts/
│ ├── fetch_moonshine.sh ← downloads pinned, verified model weights
│ ├── generate_esp32_cert.py ← generates the ESP32 TLS cert (cert.h + esp32.crt)
│ └── generate_certs.py ← generic self-signed cert helper
│
├── src/
│ ├── assets/
│ │ └── tokenizer.json ← Moonshine subword tokenizer
│ └── voicecommand/
│ ├── voice_LED_control.py ← main application entry point
│ ├── onnx_model.py ← SHA256-verified model loader
│ ├── config.py ← typed config with env-var overrides
│ ├── validation.py ← input/output sanitisation and allowlists
│ ├── auth.py ← device registration and API key management
│ ├── discovery.py ← mDNS auto-discovery of ESP32 devices
│ └── secure_logging.py ← rotating logs and audit log with redaction
│
└── ESP32LEDs/
├── ESP32LEDs.ino ← ESP32 firmware (HTTPS via esp_https_server, mDNS, auth, rate limiting)
├── secrets.h.example ← template for WiFi creds + API key (copy to secrets.h)
├── secrets.h ← your WiFi creds + API key (gitignored)
└── cert.h ← generated TLS cert/key for the firmware (gitignored)
MIT — see LICENSE for full text.