Cat

A fully local voice assistant. No cloud, no subscriptions. It listens, thinks, sees (when asked), and speaks — all on your machine.

Features

Continuous speech recognition via OpenAI Whisper (runs locally)
Local LLM responses via Ollama (default: llama3.1)
Two TTS backends: pyttsx3 (basic) or Piper (neural quality)
Vision on demand — webcam object/scene description via a multimodal LLM, activated only when you ask
Face blurring before any image reaches the LLM (MTCNN)
Animated cat that lip-syncs to audio amplitude (Pygame + Piper)
Two run modes: classic voice loop or deterministic LangGraph StateGraph agent

Stack

Component	Tool
Speech recognition	Whisper
Conversation LLM	Ollama (llama3.1)
Vision LLM	Ollama (minicpm-v)
Text-to-speech	Piper / pyttsx3
Face detection	MTCNN
Animation	Pygame
Orchestration	LangGraph StateGraph

Requirements

Python 3.11+
Ollama running locally (http://localhost:11434)
A working microphone
For vision: a webcam
For Piper TTS: download a .onnx voice model (see Piper voices)

Installation

pip install -r requirements.txt

Pull the required Ollama models:

ollama pull llama3.1        # conversation
ollama pull minicpm-v       # vision (optional)

Usage

Classic mode (simple loop)

python -m cat.src.main

Agent mode with vision and animated cat

python -m cat.src.main \
    --agent-mode \
    --tts-backend piper \
    --piper-model voice_models/en_US-lessac-medium.onnx \
    --enable-vision \
    --show-cat

All options

Speech Recognition:
  --whisper-model   tiny | base | small | medium | large  (default: base)
  --listen-timeout  seconds to wait for speech to start   (default: 5.0)
  --phrase-timeout  pause before considering speech done  (default: 3.0)

AI Processing:
  --ai-model        Ollama model name                     (default: llama3.1)
  --ollama-url      Ollama server URL                     (default: http://localhost:11434)
  --temperature     response temperature 0.0–1.0          (default: 0.7)
  --system-prompt   system prompt text

Text-to-Speech:
  --tts-backend     pyttsx3 | piper                       (default: pyttsx3)
  --piper-model     path to .onnx voice model
  --speech-rate     words per minute (pyttsx3 only)       (default: 150)
  --speech-volume   volume 0.0–1.0 (pyttsx3 only)        (default: 1.0)

Agent Mode:
  --agent-mode      use LangGraph StateGraph agent
  --show-cat        show animated cat window
  --assets-dir      directory with cat sprite images
  --enable-vision   enable webcam object detection
  --camera-index    camera device index                   (default: 0)

How it works

Voice loop

The core loop is: listen → process → speak → repeat. The recognizer pauses during speech output so Cat doesn't hear itself and spiral.

Agent mode

Instead of a ReAct agent (tried it, unreliable for daily use), Cat uses a deterministic StateGraph:

listen → analyze → [route] → respond / vision / exit → react → speak → END

The analyze node classifies intent (conversational / vision / exit) and the graph routes accordingly — no ambiguity.

Vision

Vision is off by default. When you ask "what do you see?", the intent classifier routes to the vision node, which captures a webcam frame, blurs any faces with MTCNN, and sends the image to a multimodal LLM for a natural description.

Privacy

Faces are blurred before the image is analyzed or saved. The LLM never receives identifiable faces.

Running tests

pytest tests/

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cat

Features

Stack

Requirements

Installation

Usage

Classic mode (simple loop)

Agent mode with vision and animated cat

All options

How it works

Voice loop

Agent mode

Vision

Privacy

Running tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cat

Features

Stack

Requirements

Installation

Usage

Classic mode (simple loop)

Agent mode with vision and animated cat

All options

How it works

Voice loop

Agent mode

Vision

Privacy

Running tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages