Skip to content

eamars/KazusaAIChatbot

Repository files navigation

Kazusa avatar

Kazusa Cognitive Core

A self-evolving character cognition runtime for persistent digital presence.

简体中文 · HOWTO

Python FastAPI LangGraph MongoDB License

What Kazusa Achieves

Kazusa is not a generic assistant shell. It is a psychological model of a self-evolving character brain: a runtime that keeps identity, relationship continuity, retrieval, cognition, dialog, memory, reflection, and future follow-through inside one inspectable service core.

The same brain can be reached from Discord, NapCat QQ, the browser debug UI, or another adapter that speaks the service API. Adapters stay thin. The brain service consumes typed message-envelope fields instead of parsing raw Discord, QQ, or debug-wire syntax.

At a high level, Kazusa provides:

Capability What it means
Platform-neutral character brain Discord, QQ, debug UI, and future adapters feed the same FastAPI brain service.
Typed message boundary Platform syntax is normalized into MessageEnvelope fields before cognition or RAG sees it.
Bounded live response path Queueing, relevance, RAG, cognition, action routing, and L3 surfaces are explicit stages with caps and inspectable payloads.
Multi-horizon memory Recent chat, short-term conversation flow, retrieved evidence, durable memory, and scheduled commitments remain separate.
Internal monologue residue A short private residue lane carries bounded first-person reasons from completed episodes into the next L2a cognition pass.
RAG 2 evidence retrieval Helper agents retrieve user profiles, memories, conversation history, live facts, web evidence, and recall state.
Layered cognition Cognition decides stance, boundaries, judgment, style, action needs, and response goals before selected L3 surfaces render output.
Background consolidation Completed episodes update durable memory, relationship state, Cache2 invalidation, images, and progress from text plus action/surface traces.
Reflection outside chat Hourly, daily, and promoted reflection runs are stored as audit records and only promoted context can enter normal cognition.
Scheduled follow-through Accepted future promises can become validated scheduled tasks delivered later through registered adapters.
Event logging observability Runtime, LLM, RAG, action routing, surfaces, reflection, self-cognition, dispatcher, consolidation, and DB operations emit sanitized operational events.

What You Can Build

Use case Why Kazusa fits
Persistent character companion The runtime keeps relationship memory, short-term flow, character state, and reflection separate but connected.
Group-chat character bot Queue pruning, typed addressees, native reply hydration, and adapter-specific delivery let the brain survive noisy channels.
Local model character lab Route-specific OpenAI-compatible model settings let weaker local models handle narrower, staged prompts.
Memory and RAG experiments RAG 2, Cache2, scoped user memory, shared memory evolution, and conversation search are modular enough to inspect independently.
Cross-platform adapter experiments New adapters only need to normalize platform events into the service contract and render returned messages.
Promise and follow-through workflows Accepted future commitments can be validated, persisted, deduplicated, and delivered later through registered adapters.

Supported LLMs

Kazusa is designed around OpenAI-compatible endpoints rather than one hosted vendor. All OpenAI-compatible chat completion endpoints are technically supported, and route-specific configuration lets different stages use different models when needed.

In practice, Kazusa can be configured like a model routing table: lightweight or local models can handle most structured reasoning, while a different hosted model can be assigned to a stage where you want stronger voice or generation quality. The route names below are the configuration handles documented in the HOWTO. One working-style configuration looks like this:

Route Example model Example source
RELEVANCE_AGENT_LLM local-model http://localhost:1234/v1
VISION_DESCRIPTOR_LLM local-model http://localhost:1234/v1
MSG_DECONTEXTUALIZER_LLM local-model http://localhost:1234/v1
RAG_PLANNER_LLM local-model http://localhost:1234/v1
RAG_SUBAGENT_LLM local-model http://localhost:1234/v1
WEB_SEARCH_LLM local-model http://localhost:1234/v1
COGNITION_LLM local-model http://localhost:1234/v1
DIALOG_GENERATOR_LLM deepseek-v4-flash https://api.deepseek.com
DIALOG_EVALUATOR_LLM local-model http://localhost:1234/v1
CONSOLIDATION_LLM local-model http://localhost:1234/v1
JSON_REPAIR_LLM local-model http://localhost:1234/v1
EMBEDDING text-embedding-nomic-embed-text-v2-moe http://localhost:1234/v1

The table is an example, not a fixed requirement. Any route can point to any OpenAI-compatible endpoint that can satisfy that stage's latency and quality needs.

Tested chat model families:

  • Gemma 4 26B MoE
  • Qwen3.6 27B
  • DeepSeek v4

Kazusa also requires an OpenAI-compatible embeddings endpoint for conversation history, memory retrieval, and vector search features. Local deployments commonly use LM Studio or another OpenAI-compatible end points.

Architecture At A Glance

Discord / NapCat QQ / Debug UI / future adapters
        |
        | typed ChatRequest + MessageEnvelope
        v
FastAPI brain service
        |
        v
Process-local input queue
  - collapse nearby follow-ups
  - drop burst noise before RAG
  - persist dropped user rows without replying
        |
        v
Listen gate and perception
  - hydrate reply context
  - describe image inputs when needed
  - decide whether Kazusa should answer
        |
        v
Persona turn
  - decontextualize the current message
  - retrieve evidence through RAG 2
  - load short-term conversation progress
  - load projected private residue for L2a only
  - reason through stance, boundary, style, and intent
  - initialize zero-or-more semantic actions through L2d
  - run selected L3 text/action handlers
  - emit surface outputs and action results
        |
        +-----------------------------> adapter bridge delivers visible surfaces
        |
        v
Post-turn work
  - persist assistant surface rows and delivery tracking
  - record conversation progress
  - record compact internal monologue residue outside visible response work
  - consolidate durable memory and state from the episode trace
  - invalidate stale Cache2 entries
  - schedule accepted future promises
  - run reflection and growth workers outside live chat
        |
        v
MongoDB + model routes + optional MCP web tools + platform callbacks

Visible adapter delivery follows selected text surface outputs. Private action results, scheduled-action results, no-visible-output decisions, and private finalization still feed episode-trace consolidation without creating adapter sends.

The core boundary is deliberately narrow:

adapter/debug client -> brain service -> queue/intake -> typed episode/RAG
-> cognition/L2d -> selected L3 surfaces/action handlers
-> episode-trace consolidation -> scheduler/reflection

Design Principles

LLM-first semantics, deterministic mechanics

LLM stages judge meaning: response relevance, missing evidence, memory meaning, accepted promises, character stance, action choice, and surface intent. Deterministic code owns validation, persistence, limits, cache invalidation, scheduling, adapter delivery, and auditability.

Evidence is not persona

RAG answers "what is known?" Cognition answers "what does this mean for Kazusa right now?" L2d answers "which actions or surfaces are needed?" L3/dialog answers "how should the selected surface render it?"

Memory has ownership

Kazusa does not flatten all context into one prompt. Immediate surface text, conversation progress, retrieved evidence, durable memory, promoted reflection, and scheduled commitments each have a separate lifecycle.

Internal monologue residue is a separate short-lived lane. It stores one compact first-person reason from a completed episode and projects it only into L2a as internal_monologue_residue_context. It is not reflection_summary, durable memory, visible dialog planning, or scheduler input.

Reflection does not shortcut into live chat

Reflection is slower sense-making work. Raw reflection output is stored for inspection, but normal cognition only receives bounded, promoted, gated context.

Adapters are transport edges

Platform adapters parse platform events, normalize typed envelopes, call the brain service, and deliver returned messages. Character identity, memory, RAG, cognition, and scheduling remain in the platform-neutral core.

Runtime Layers

Layer Owns Key docs
Adapters Discord, NapCat QQ, debug UI transport and platform rendering HOWTO
Brain service HTTP API, queue, graph startup, health, delivery receipts, runtime adapter registration Brain Service ICD
Message envelope Typed inbound content, mentions, replies, attachments, addressees, broadcast state Message Envelope ICD
Conversation progress Short-term episode state used by cognition to avoid loops and stale reopenings Conversation Progress
Internal monologue residue Short-lived private first-person residue loaded only into L2a cognition Internal Monologue Residue ICD
RAG 2 Slot-driven helper-agent retrieval and Cache2 evidence projection RAG 2
Cognition and dialog Character stance, boundaries, judgment, style, visual directives, and final wording Cognition Nodes
Action spec L2d action residues, capability registry, evaluator, results, surfaces, and traces Action Spec
Consolidation Durable target routing, write-intent validation, and target-specific persistence Consolidation ICD
Database MongoDB collection ownership, embeddings, indexes, public persistence helpers Database ICD
Event logging Sanitized operational telemetry, status snapshots, statistics, and export contracts Event Logging ICD
Dispatcher and scheduler Validated delayed tool execution for accepted future promises Dispatcher
Reflection cycle Background reflection runs, promotion gates, prompt-safe reflection context Reflection Cycle ICD
Memory evolution Curated shared memory lifecycle, lineage, seed reset, promoted memory writes Memory Evolution ICD
Global character growth Slow promoted-trait drift from approved reflection memory Global Character Growth ICD
Proactive output Permissioned preview/outbox contracts for future autonomous contact paths Proactive Output ICD

Quick Start

Kazusa expects MongoDB plus OpenAI-compatible chat and embedding endpoints. LM Studio works for local development, but any compatible endpoint can be used. All route-specific model environment variables are documented in docs/HOWTO.md.

python -m venv venv
venv\Scripts\activate
pip install -U pip
pip install -e ".[dev]"

Load a character profile before starting the brain:

python -m scripts.load_character_profile personalities/kazusa.json

Run the brain service:

kazusa-brain --host 0.0.0.0 --port 8000

Or use Uvicorn directly:

uvicorn kazusa_ai_chatbot.service:app --host 0.0.0.0 --port 8000

Run the browser debug adapter:

python -m adapters.debug_adapter --brain-url http://localhost:8000 --port 8080

Then open http://localhost:8080.

Repository Map

src/
  adapters/                    Platform adapters and debug UI
  kazusa_ai_chatbot/
    brain_service/             Service API, graph, intake, health, post-turn glue
    message_envelope/          Typed adapter-to-brain message contract
    nodes/                     Persona, cognition, and dialog stages
    action_spec/               Modality-neutral action contracts, registry, results
    consolidation/             Durable consolidation helpers, target routing, and ICD
    rag/                       RAG 2 helper agents, hybrid retrieval, Cache2
    conversation_progress/     Short-term episode memory
    internal_monologue_residue/ Short-lived private residue lane for L2a
    db/                        MongoDB facade, schemas, collection owners
    event_logging/             Sanitized operational telemetry interface and ICD
    dispatcher/                Delayed task validation and adapter handoff
    reflection_cycle/          Background reflection and promotion
    memory_evolution/          Shared memory lifecycle and seed reset
    global_character_growth/   Slow promoted character-growth traits
    proactive_output/          Permissioned proactive preview contracts
  scripts/                     Operator and maintenance CLIs
docs/
  HOWTO.md                     Setup, runtime commands, environment, tests
development_plans/             Approved, archived, and reference plan registry
tests/                         Deterministic, live DB, and live LLM test suites
resources/
  avatar.png                   README avatar asset

Testing

Default test runs exclude live DB and live LLM tests through pytest.ini.

venv\Scripts\python -m pytest -q
venv\Scripts\python -m pytest -m "not live_db and not live_llm" -q

Live LLM tests must be run one case at a time with output inspected. Live DB tests require MongoDB. See docs/HOWTO.md for the project testing contract.

Project Status

Kazusa Cognitive Core is alpha-stage experimental infrastructure for a persistent digital character. The main runtime is usable as a local brain service with adapters, memory, retrieval, reflection, and scheduling, but some autonomous-contact surfaces intentionally remain permissioned preview contracts rather than production sends.

Documentation Index

Document Purpose
README.md Project overview and architecture map
README_CN.md Simplified Chinese project overview
docs/HOWTO.md Local setup, environment variables, run commands, adapters, tests
Brain Service ICD HTTP endpoint contracts and adapter obligations
Message Envelope ICD Typed inbound message contract
Database ICD Persistence ownership and collection contracts
Internal Monologue Residue ICD Short-lived private residue lifecycle and L2a-only contract
Action Spec Modality-neutral action contracts and trace handoff
Consolidation ICD Durable target routing and write-intent validation
Event Logging ICD Sanitized telemetry interface, event taxonomy, and ops statistics
RAG 2 Retrieval architecture and evidence projection
Cognition Nodes Layered cognition, dialog, and node-package design contracts
Development Plans Registry Active, archived, reference, and roadmap documents

License

Kazusa Cognitive Core is released under the GNU Affero General Public License v3.0.

About

A self-evolving character cognition runtime for persistent digital presence.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages