Kazusa Cognitive Core

A self-evolving character cognition runtime for persistent digital presence.

What Kazusa Achieves

Kazusa is not a generic assistant shell. It is a psychological model of a self-evolving character brain: a runtime that keeps identity, relationship continuity, retrieval, cognition, dialog, memory, reflection, and future follow-through inside one inspectable service core.

The same brain can be reached from Discord, NapCat QQ, the browser debug UI, or another adapter that speaks the service API. Adapters stay thin. The brain service consumes typed message-envelope fields instead of parsing raw Discord, QQ, or debug-wire syntax.

At a high level, Kazusa provides:

Capability	What it means
Platform-neutral character brain	Discord, QQ, debug UI, and future adapters feed the same FastAPI brain service.
Typed message boundary	Platform syntax is normalized into `MessageEnvelope` fields before cognition or RAG sees it.
Bounded live response path	Queueing, relevance, RAG, cognition, action routing, and L3 surfaces are explicit stages with caps and inspectable payloads.
Multi-horizon memory	Recent chat, short-term conversation flow, retrieved evidence, durable memory, and scheduled commitments remain separate.
Internal monologue residue	A short private residue lane carries bounded first-person reasons from completed episodes into the next L2a cognition pass.
RAG 2 evidence retrieval	Helper agents retrieve user profiles, memories, conversation history, live facts, web evidence, and recall state.
Layered cognition	Cognition decides stance, boundaries, judgment, style, action needs, and response goals before selected L3 surfaces render output.
Background consolidation	Completed episodes update durable memory, relationship state, Cache2 invalidation, images, and progress from text plus action/surface traces.
Reflection outside chat	Hourly, daily, and promoted reflection runs are stored as audit records and only promoted context can enter normal cognition.
Scheduled follow-through	Accepted future promises can become validated scheduled tasks delivered later through registered adapters.
Event logging observability	Runtime, LLM, RAG, action routing, surfaces, reflection, self-cognition, dispatcher, consolidation, and DB operations emit sanitized operational events.

What You Can Build

Use case	Why Kazusa fits
Persistent character companion	The runtime keeps relationship memory, short-term flow, character state, and reflection separate but connected.
Group-chat character bot	Queue pruning, typed addressees, native reply hydration, and adapter-specific delivery let the brain survive noisy channels.
Local model character lab	Route-specific OpenAI-compatible model settings let weaker local models handle narrower, staged prompts.
Memory and RAG experiments	RAG 2, Cache2, scoped user memory, shared memory evolution, and conversation search are modular enough to inspect independently.
Cross-platform adapter experiments	New adapters only need to normalize platform events into the service contract and render returned messages.
Promise and follow-through workflows	Accepted future commitments can be validated, persisted, deduplicated, and delivered later through registered adapters.

Supported LLMs

Kazusa is designed around OpenAI-compatible endpoints rather than one hosted vendor. All OpenAI-compatible chat completion endpoints are technically supported, and route-specific configuration lets different stages use different models when needed.

In practice, Kazusa can be configured like a model routing table: lightweight or local models can handle most structured reasoning, while a different hosted model can be assigned to a stage where you want stronger voice or generation quality. The route names below are the configuration handles documented in the HOWTO. One working-style configuration looks like this:

Route	Example model	Example source
`RELEVANCE_AGENT_LLM`	`local-model`	`http://localhost:1234/v1`
`VISION_DESCRIPTOR_LLM`	`local-model`	`http://localhost:1234/v1`
`MSG_DECONTEXTUALIZER_LLM`	`local-model`	`http://localhost:1234/v1`
`RAG_PLANNER_LLM`	`local-model`	`http://localhost:1234/v1`
`RAG_SUBAGENT_LLM`	`local-model`	`http://localhost:1234/v1`
`WEB_SEARCH_LLM`	`local-model`	`http://localhost:1234/v1`
`COGNITION_LLM`	`local-model`	`http://localhost:1234/v1`
`DIALOG_GENERATOR_LLM`	`deepseek-v4-flash`	`https://api.deepseek.com`
`DIALOG_EVALUATOR_LLM`	`local-model`	`http://localhost:1234/v1`
`CONSOLIDATION_LLM`	`local-model`	`http://localhost:1234/v1`
`JSON_REPAIR_LLM`	`local-model`	`http://localhost:1234/v1`
`EMBEDDING`	`text-embedding-nomic-embed-text-v2-moe`	`http://localhost:1234/v1`

The table is an example, not a fixed requirement. Any route can point to any OpenAI-compatible endpoint that can satisfy that stage's latency and quality needs.

Tested chat model families:

Gemma 4 26B MoE
Qwen3.6 27B
DeepSeek v4

Kazusa also requires an OpenAI-compatible embeddings endpoint for conversation history, memory retrieval, and vector search features. Local deployments commonly use LM Studio or another OpenAI-compatible end points.

Architecture At A Glance

Discord / NapCat QQ / Debug UI / future adapters
        |
        | typed ChatRequest + MessageEnvelope
        v
FastAPI brain service
        |
        v
Process-local input queue
  - collapse nearby follow-ups
  - drop burst noise before RAG
  - persist dropped user rows without replying
        |
        v
Listen gate and perception
  - hydrate reply context
  - describe image inputs when needed
  - decide whether Kazusa should answer
        |
        v
Persona turn
  - decontextualize the current message
  - retrieve evidence through RAG 2
  - load short-term conversation progress
  - load projected private residue for L2a only
  - reason through stance, boundary, style, and intent
  - initialize zero-or-more semantic actions through L2d
  - run selected L3 text/action handlers
  - emit surface outputs and action results
        |
        +-----------------------------> adapter bridge delivers visible surfaces
        |
        v
Post-turn work
  - persist assistant surface rows and delivery tracking
  - record conversation progress
  - record compact internal monologue residue outside visible response work
  - consolidate durable memory and state from the episode trace
  - invalidate stale Cache2 entries
  - schedule accepted future promises
  - run reflection and growth workers outside live chat
        |
        v
MongoDB + model routes + optional MCP web tools + platform callbacks

Visible adapter delivery follows selected text surface outputs. Private action results, scheduled-action results, no-visible-output decisions, and private finalization still feed episode-trace consolidation without creating adapter sends.

The core boundary is deliberately narrow:

adapter/debug client -> brain service -> queue/intake -> typed episode/RAG
-> cognition/L2d -> selected L3 surfaces/action handlers
-> episode-trace consolidation -> scheduler/reflection

Design Principles

LLM-first semantics, deterministic mechanics

LLM stages judge meaning: response relevance, missing evidence, memory meaning, accepted promises, character stance, action choice, and surface intent. Deterministic code owns validation, persistence, limits, cache invalidation, scheduling, adapter delivery, and auditability.

Evidence is not persona

RAG answers "what is known?" Cognition answers "what does this mean for Kazusa right now?" L2d answers "which actions or surfaces are needed?" L3/dialog answers "how should the selected surface render it?"

Memory has ownership

Kazusa does not flatten all context into one prompt. Immediate surface text, conversation progress, retrieved evidence, durable memory, promoted reflection, and scheduled commitments each have a separate lifecycle.

Internal monologue residue is a separate short-lived lane. It stores one compact first-person reason from a completed episode and projects it only into L2a as internal_monologue_residue_context. It is not reflection_summary, durable memory, visible dialog planning, or scheduler input.

Reflection does not shortcut into live chat

Reflection is slower sense-making work. Raw reflection output is stored for inspection, but normal cognition only receives bounded, promoted, gated context.

Adapters are transport edges

Platform adapters parse platform events, normalize typed envelopes, call the brain service, and deliver returned messages. Character identity, memory, RAG, cognition, and scheduling remain in the platform-neutral core.

Runtime Layers

Layer	Owns	Key docs
Adapters	Discord, NapCat QQ, debug UI transport and platform rendering	HOWTO
Brain service	HTTP API, queue, graph startup, health, delivery receipts, runtime adapter registration	Brain Service ICD
Message envelope	Typed inbound content, mentions, replies, attachments, addressees, broadcast state	Message Envelope ICD
Conversation progress	Short-term episode state used by cognition to avoid loops and stale reopenings	Conversation Progress
Internal monologue residue	Short-lived private first-person residue loaded only into L2a cognition	Internal Monologue Residue ICD
RAG 2	Slot-driven helper-agent retrieval and Cache2 evidence projection	RAG 2
Cognition and dialog	Character stance, boundaries, judgment, style, visual directives, and final wording	Cognition Nodes
Action spec	L2d action residues, capability registry, evaluator, results, surfaces, and traces	Action Spec
Consolidation	Durable target routing, write-intent validation, and target-specific persistence	Consolidation ICD
Database	MongoDB collection ownership, embeddings, indexes, public persistence helpers	Database ICD
Event logging	Sanitized operational telemetry, status snapshots, statistics, and export contracts	Event Logging ICD
Dispatcher and scheduler	Validated delayed tool execution for accepted future promises	Dispatcher
Reflection cycle	Background reflection runs, promotion gates, prompt-safe reflection context	Reflection Cycle ICD
Memory evolution	Curated shared memory lifecycle, lineage, seed reset, promoted memory writes	Memory Evolution ICD
Global character growth	Slow promoted-trait drift from approved reflection memory	Global Character Growth ICD
Proactive output	Permissioned preview/outbox contracts for future autonomous contact paths	Proactive Output ICD

Quick Start

Kazusa expects MongoDB plus OpenAI-compatible chat and embedding endpoints. LM Studio works for local development, but any compatible endpoint can be used. All route-specific model environment variables are documented in docs/HOWTO.md.

python -m venv venv
venv\Scripts\activate
pip install -U pip
pip install -e ".[dev]"

Load a character profile before starting the brain:

python -m scripts.load_character_profile personalities/kazusa.json

Run the brain service:

kazusa-brain --host 0.0.0.0 --port 8000

Or use Uvicorn directly:

uvicorn kazusa_ai_chatbot.service:app --host 0.0.0.0 --port 8000

Run the browser debug adapter:

python -m adapters.debug_adapter --brain-url http://localhost:8000 --port 8080

Then open http://localhost:8080.

Repository Map

src/
  adapters/                    Platform adapters and debug UI
  kazusa_ai_chatbot/
    brain_service/             Service API, graph, intake, health, post-turn glue
    message_envelope/          Typed adapter-to-brain message contract
    nodes/                     Persona, cognition, and dialog stages
    action_spec/               Modality-neutral action contracts, registry, results
    consolidation/             Durable consolidation helpers, target routing, and ICD
    rag/                       RAG 2 helper agents, hybrid retrieval, Cache2
    conversation_progress/     Short-term episode memory
    internal_monologue_residue/ Short-lived private residue lane for L2a
    db/                        MongoDB facade, schemas, collection owners
    event_logging/             Sanitized operational telemetry interface and ICD
    dispatcher/                Delayed task validation and adapter handoff
    reflection_cycle/          Background reflection and promotion
    memory_evolution/          Shared memory lifecycle and seed reset
    global_character_growth/   Slow promoted character-growth traits
    proactive_output/          Permissioned proactive preview contracts
  scripts/                     Operator and maintenance CLIs
docs/
  HOWTO.md                     Setup, runtime commands, environment, tests
development_plans/             Approved, archived, and reference plan registry
tests/                         Deterministic, live DB, and live LLM test suites
resources/
  avatar.png                   README avatar asset

Testing

Default test runs exclude live DB and live LLM tests through pytest.ini.

venv\Scripts\python -m pytest -q
venv\Scripts\python -m pytest -m "not live_db and not live_llm" -q

Live LLM tests must be run one case at a time with output inspected. Live DB tests require MongoDB. See docs/HOWTO.md for the project testing contract.

Project Status

Kazusa Cognitive Core is alpha-stage experimental infrastructure for a persistent digital character. The main runtime is usable as a local brain service with adapters, memory, retrieval, reflection, and scheduling, but some autonomous-contact surfaces intentionally remain permissioned preview contracts rather than production sends.

Documentation Index

Document	Purpose
README.md	Project overview and architecture map
README_CN.md	Simplified Chinese project overview
docs/HOWTO.md	Local setup, environment variables, run commands, adapters, tests
Brain Service ICD	HTTP endpoint contracts and adapter obligations
Message Envelope ICD	Typed inbound message contract
Database ICD	Persistence ownership and collection contracts
Internal Monologue Residue ICD	Short-lived private residue lifecycle and L2a-only contract
Action Spec	Modality-neutral action contracts and trace handoff
Consolidation ICD	Durable target routing and write-intent validation
Event Logging ICD	Sanitized telemetry interface, event taxonomy, and ops statistics
RAG 2	Retrieval architecture and evidence projection
Cognition Nodes	Layered cognition, dialog, and node-package design contracts
Development Plans Registry	Active, archived, reference, and roadmap documents

License

Kazusa Cognitive Core is released under the GNU Affero General Public License v3.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kazusa Cognitive Core

What Kazusa Achieves

What You Can Build

Supported LLMs

Architecture At A Glance

Design Principles

Runtime Layers

Quick Start

Repository Map

Testing

Project Status

Documentation Index

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 414 Commits
.agents/skills		.agents/skills
development_plans		development_plans
docs		docs
experiments		experiments
personalities		personalities
resources		resources
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Folders and files

Latest commit

History

Repository files navigation

Kazusa Cognitive Core

What Kazusa Achieves

What You Can Build

Supported LLMs

Architecture At A Glance

Design Principles

Runtime Layers

Quick Start

Repository Map

Testing

Project Status

Documentation Index

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages