diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
new file mode 100644
index 00000000..5897b22b
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,22 @@
+---
+name: Bug Report
+about: Report something that isn't working
+labels: bug
+---
+
+**Describe the bug**
+A clear description of what's going wrong.
+
+**To reproduce**
+Steps to reproduce the behavior:
+1. ...
+2. ...
+
+**Expected behavior**
+What you expected to happen.
+
+**Environment**
+- Python version:
+- ClawLoop version:
+- OS:
+- LLM provider (if relevant):
diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md
new file mode 100644
index 00000000..e31e0b12
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,14 @@
+---
+name: Feature Request
+about: Suggest an idea or improvement
+labels: enhancement
+---
+
+**Use case**
+What are you trying to accomplish?
+
+**Proposed solution**
+How do you think this could work?
+
+**Alternatives considered**
+Any other approaches you've thought about.
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
new file mode 100644
index 00000000..b17ed997
--- /dev/null
+++ b/.github/pull_request_template.md
@@ -0,0 +1,8 @@
+## Summary
+
+What changed and why.
+
+## Test plan
+
+- [ ] `pytest tests/ -x` passes
+- [ ] Tested manually (if applicable)
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
new file mode 100644
index 00000000..c6921dc0
--- /dev/null
+++ b/.github/workflows/docs.yml
@@ -0,0 +1,48 @@
+name: Docs
+
+on:
+  push:
+    branches: [main]
+    paths: [docs/**, mkdocs.yml]
+  pull_request:
+    paths: [docs/**, mkdocs.yml]
+
+permissions:
+  contents: read
+  pages: write
+  id-token: write
+
+concurrency:
+  group: pages
+  cancel-in-progress: true
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+      - run: pip install mkdocs-material
+      - run: mkdocs build --strict
+
+  deploy:
+    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
+    needs: build
+    runs-on: ubuntu-latest
+    environment:
+      name: github-pages
+      url: ${{ steps.deployment.outputs.page_url }}
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+      - run: pip install mkdocs-material
+      - run: mkdocs build --strict
+      - uses: actions/upload-pages-artifact@v3
+        with:
+          path: site/
+      - id: deployment
+        uses: actions/deploy-pages@v4
diff --git a/.gitignore b/.gitignore
index 6bbc0055..7b189d3c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -24,3 +24,7 @@ build/
 dist/
 examples/openclaw_runner/node_modules/
 examples/openclaw_runner/package-lock.json
+
+# Runtime artifacts
+playbook.json
+runs/
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 1dec762f..f94b98f1 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -9,15 +9,106 @@ git clone https://github.com/aganthos/clawloop.git
 cd clawloop
 python -m venv .venv && source .venv/bin/activate
 pip install -e ".[dev]"
-python -m pytest tests/
+pytest tests/ -x
 ```
 
-## Guidelines
+## Architecture Overview
+
+ClawLoop has three learning layers that all follow the same protocol:
+
+```
+clawloop/
+  core/         # Types (Episode, Datum, StateID), protocols (Layer, Evolver),
+                #   and the learning loop itself
+  layers/       # The three learning layers: Harness, Router, Weights
+  envs/         # Built-in task environments (math, harbor) — simple, self-contained
+  adapters/     # Connectors for external benchmarks (CAR-bench, CRMArena, OpenClaw)
+                #   that require process orchestration or network calls
+  evolvers/     # Harness optimization backends (LocalEvolver ships by default)
+  backends/     # Weight training backends (SkyRL integration for GRPO/PPO/SFT)
+  extractors/   # Compute reward signals from raw episode traces
+  exporters/    # Send data out: OpenTelemetry spans, SkyRL training format,
+                #   router tuning tuples
+  callbacks/    # Hook into litellm call lifecycle to capture traces
+  utils/        # Small helpers (async bridge)
+```
+
+**Key types:** `Episode`, `EpisodeSummary`, `Datum`, `AgentState`, `StateID`
+
+**Layer Protocol:** Every layer implements `forward_backward()` (accumulate
+updates without mutation) and `optim_step()` (apply atomically, rollback on
+failure). See `clawloop/core/layer.py`.
+
+**Learning loop:** `clawloop/core/loop.py` — collects episodes, distributes
+them as `Datum` objects, runs forward_backward then optim_step on each layer.
+
+## Adding a New Environment
+
+1. Create an adapter in `clawloop/adapters/` implementing `EnvAdapter`
+2. Your `run_episode()` must return an `Episode` with messages, steps, and
+   an `EpisodeSummary` containing reward signals
+3. Register it in `clawloop/train.py` via `ENV_BUILDERS`
+
+Existing adapters to learn from:
+
+- `clawloop/envs/math.py` — minimal example (~80 lines)
+- `clawloop/envs/harbor.py` — sandboxed agent tasks via Docker
+- `clawloop/adapters/car.py` — CAR-bench integration with external process orchestration
+- `clawloop/adapters/entropic.py` — CRMArena A2A benchmark
+
+See [Adding Environments](https://aganthos.github.io/clawloop/adding-environments/)
+for a full walkthrough.
+
+## Testing
+
+```bash
+# Run all tests
+pytest tests/ -x
+
+# Run a specific test file
+pytest tests/test_agent.py -x
+
+# Run a specific test
+pytest tests/test_agent.py::TestClawLoopAgent::test_learn_basic -x
+
+# Run with verbose output
+pytest tests/ -x -v --timeout=30
+```
+
+Tests use `MockLLMClient` from `clawloop/llm.py` — no API keys needed. The
+`tests/conftest.py` has a boundary guard that prevents tests from importing
+private modules.
+
+## Code Style
 
-- Run `pytest tests/ -x` before submitting a PR
 - Follow existing code patterns
-- One commit per logical change: `feat:`, `fix:`, or `chore:` prefix
+- Use type hints on all public functions and methods
+- Add docstrings to public classes and functions
+- Use `from __future__ import annotations` for forward references
+- Use `Protocol` for interfaces, `@dataclass` for value types
+- No linter is enforced yet — just keep it consistent with surrounding code
+
+## Commits
+
+One commit per logical change with a prefix:
+
+- `feat:` new functionality
+- `fix:` bug fix
+- `chore:` maintenance, docs, CI
+
+## Pull Requests
+
+- Run `pytest tests/ -x` before submitting
+- Keep PRs focused — one concern per PR
+- Describe what changed and why in the PR description
+
+## Issues
+
+- **Bug reports:** include steps to reproduce, expected vs actual behavior,
+  and your Python version
+- **Feature requests:** describe the use case, not just the solution
 
 ## License
 
-By contributing, you agree that your contributions will be licensed under the BSL 1.1 license.
+By contributing, you agree that your contributions will be licensed under
+the [BSL 1.1](LICENSE) license.
diff --git a/README.md b/README.md
index 37882f73..61e4bc7c 100644
--- a/README.md
+++ b/README.md
@@ -136,8 +136,8 @@ ClawLoop uses [litellm](https://docs.litellm.ai/) — any provider works:
 
 ```json
 {"model": "anthropic/claude-haiku-4-5-20251001"}
-{"model": "openai/gpt-4o-mini"}
-{"model": "gemini/gemini-2.0-flash-lite"}
+{"model": "openai/gpt-5-nano"}
+{"model": "gemini/gemini-3.1-flash-lite"}
 ```
 
 Set the provider's API key as an environment variable (`ANTHROPIC_API_KEY`,
@@ -200,6 +200,21 @@ and an `EpisodeSummary` containing reward signals. See `clawloop/envs/math.py`
 
 </details>
 
+## Enterprise
+
+ClawLoop Enterprise adds premium learning backends and managed
+infrastructure on top of the community edition.
+
+- **Premium evolution backends** — broader search over prompts, playbooks,
+  and agent configurations than the community `LocalEvolver`
+- **Persistent playbooks** — versioned storage with rollback so learned
+  strategies survive restarts
+- **Managed training infrastructure** — hosted compute for weight training
+  without self-hosting GPUs
+- **Logging & lineage** — episode archive with provenance tracking
+
+Contact [info@aganthos.com](mailto:info@aganthos.com) to learn more.
+
 ## License
 
 ClawLoop is licensed under the [Business Source License 1.1](LICENSE) with
diff --git a/clawloop/adapters/car.py b/clawloop/adapters/car.py
index 9750b7a0..530f04be 100644
--- a/clawloop/adapters/car.py
+++ b/clawloop/adapters/car.py
@@ -35,8 +35,6 @@
 class CARAdapter(EnvAdapter):
     """Adapter for CAR-bench. Runs agentbeats-run per learning iteration."""
 
-    CAR_BENCH_TESTED_COMMIT = "TBD"
-
     def setup(self, config: dict[str, Any]) -> None:
         self._model = config.get("model", "anthropic/claude-haiku-4-5-20251001")
         self._car_bench_path = Path(
diff --git a/clawloop/adapters/tau2.py b/clawloop/adapters/tau2.py
deleted file mode 100644
index 7192d206..00000000
--- a/clawloop/adapters/tau2.py
+++ /dev/null
@@ -1,40 +0,0 @@
-"""tau2-bench adapter — Python API via LocalAgent subclass.
-
-Uses the Python API directly (not a CLI wrapper).  Maps ``SimulationRun`` ->
-``Episode``.  Reward is the product of all dimensions (sparse, binary-ish);
-``reward_info.reward_breakdown`` provides per-dimension signals.
-
-Domains: airline, retail.  Use ``"base"`` split for comparability.
-"""
-
-from __future__ import annotations
-
-from typing import TYPE_CHECKING, Any
-
-from clawloop.adapters.base import EnvAdapter
-from clawloop.core.episode import Episode
-
-if TYPE_CHECKING:
-    from clawloop.core.loop import AgentState
-
-
-class Tau2Adapter(EnvAdapter):
-    """Adapter for tau2-bench (stub).
-
-    Intended to subclass ``tau2.agent.base.LocalAgent`` and map
-    ``SimulationRun`` objects to ClawLoop ``Episode`` instances.
-    """
-
-    def setup(self, config: dict[str, Any]) -> None:
-        # TODO: import tau2, instantiate LocalAgent subclass,
-        # load domain config (airline/retail)
-        self._config = config
-
-    def run_episode(self, task: Any, agent_state: AgentState) -> Episode:
-        raise NotImplementedError("tau2-bench adapter not yet implemented")
-
-    def get_traces(self, episode: Episode) -> dict[str, Any]:
-        return {"bench": "tau2", "episode_id": episode.id}
-
-    def list_tasks(self, split: str = "base") -> list[Any]:
-        raise NotImplementedError("tau2-bench adapter not yet implemented")
diff --git a/clawloop/cli.py b/clawloop/cli.py
index 32139b1e..b3ced5ae 100644
--- a/clawloop/cli.py
+++ b/clawloop/cli.py
@@ -64,7 +64,6 @@ def _build_parser() -> argparse.ArgumentParser:
 ADAPTER_REGISTRY: dict[str, tuple[str, str]] = {
     "entropic": ("clawloop.adapters.entropic", "EntropicAdapter"),
     "car": ("clawloop.adapters.car", "CARAdapter"),
-    "tau2": ("clawloop.adapters.tau2", "Tau2Adapter"),
 }
 
 
@@ -222,11 +221,6 @@ def cmd_eval(args: argparse.Namespace) -> None:
         "data_setup": None,
         "uv_sync_cmd": ["uv", "sync"],
     },
-    # "tau2": {
-    #     "bench_dir": "benchmarks/tau-bench",
-    #     "data_setup": None,
-    #     "uv_sync_cmd": ["uv", "sync"],
-    # },
 }
 
 
diff --git a/clawloop/core/episode.py b/clawloop/core/episode.py
index 793d58b2..248564ae 100644
--- a/clawloop/core/episode.py
+++ b/clawloop/core/episode.py
@@ -221,7 +221,7 @@ class Episode:
     id: str
     state_id: str  # hash of layers used
     task_id: str
-    bench: str  # "entropic" | "car" | "tau2" | ...
+    bench: str  # "entropic" | "car" | ...
     messages: list[Message]
     step_boundaries: list[int]  # indices into messages where each agent turn starts
     steps: list[StepMeta]
diff --git a/docs/adding-environments.md b/docs/adding-environments.md
new file mode 100644
index 00000000..8fa13d3d
--- /dev/null
+++ b/docs/adding-environments.md
@@ -0,0 +1,95 @@
+# Adding Environments
+
+ClawLoop environments are pluggable via the `EnvAdapter` interface.
+
+## The Adapter Interface
+
+```python
+from clawloop.adapters.base import EnvAdapter
+from clawloop.core.episode import Episode
+from clawloop.core.loop import AgentState
+
+class MyAdapter(EnvAdapter):
+    def setup(self, config: dict) -> None:
+        """Initialize from config (model, paths, credentials)."""
+        ...
+
+    def run_episode(self, task: Any, agent_state: AgentState) -> Episode:
+        """Run one agent trajectory and return a structured Episode."""
+        ...
+
+    def list_tasks(self, split: str = "test") -> list:
+        """Return available task IDs."""
+        ...
+```
+
+## Building an Episode
+
+Your `run_episode` must return an `Episode` with messages, steps, and reward
+signals:
+
+```python
+from clawloop.core.episode import Episode, EpisodeSummary, Message, StepMeta
+from clawloop.core.reward import RewardSignal
+
+episode = Episode(
+    id=str(uuid4()),
+    state_id=agent_state.state_id().combined_hash,
+    task_id=task_id,
+    bench="my_bench",
+    messages=[
+        Message(role="system", content=system_prompt),
+        Message(role="user", content=task_prompt),
+        Message(role="assistant", content=agent_response),
+    ],
+    step_boundaries=[1],  # agent turn starts at message index 1
+    steps=[StepMeta(t=0, reward=score, done=True, timing_ms=0.0)],
+    summary=EpisodeSummary(
+        signals={"outcome": RewardSignal(name="outcome", value=score, confidence=1.0)},
+    ),
+)
+```
+
+**Existing adapters to learn from:**
+
+- [`clawloop/envs/math.py`](https://github.com/aganthos/clawloop/blob/main/clawloop/envs/math.py) — minimal (~80 lines), good starting point
+- [`clawloop/envs/harbor.py`](https://github.com/aganthos/clawloop/blob/main/clawloop/envs/harbor.py) — sandboxed agent tasks via Docker
+- [`clawloop/adapters/car.py`](https://github.com/aganthos/clawloop/blob/main/clawloop/adapters/car.py) — external process orchestration (agentbeats-run)
+- [`clawloop/adapters/entropic.py`](https://github.com/aganthos/clawloop/blob/main/clawloop/adapters/entropic.py) — CRMArena A2A benchmark
+
+## Registering Your Adapter
+
+Add a builder function to the training entrypoint:
+
+```python
+# clawloop/train.py
+def _build_my_env(config, llm_clients):
+    adapter = MyAdapter()
+    adapter.setup(config)
+    tasks = adapter.list_tasks()
+    return adapter, tasks
+
+ENV_BUILDERS["my_env"] = _build_my_env
+```
+
+Then run:
+
+```bash
+python examples/train_runner.py my_config.json
+```
+
+## Reward Signals
+
+Episodes carry named reward signals with a priority system:
+
+| Priority | Source | When to use |
+|----------|--------|-------------|
+| 1 (highest) | `user` | Explicit human feedback |
+| 2 | `outcome` | Verifiable correctness (math, code tests) |
+| 3 | `execution` | Tool call success, format compliance |
+| 4 (lowest) | `judge` | LLM-as-judge scoring |
+
+`EpisodeSummary.effective_reward()` resolves to the highest-priority signal
+available. If only low-confidence execution signals exist,
+`summary.needs_judge()` returns `True` — useful for triggering LLM judge
+evaluation only when needed.
diff --git a/docs/concepts.md b/docs/concepts.md
new file mode 100644
index 00000000..86ab57d2
--- /dev/null
+++ b/docs/concepts.md
@@ -0,0 +1,139 @@
+# Concepts
+
+This page explains ClawLoop's core types and how they fit together.
+
+## The Learning Loop
+
+```
+Environment → Episodes → Layers → Improved Agent → Environment → ...
+```
+
+An agent interacts with an environment. ClawLoop collects **episodes** —
+structured traces of messages, tool calls, and rewards. Learning **layers**
+process these episodes and update the agent. Repeat.
+
+## Episodes
+
+### Episode
+
+One complete agent trajectory: a sequence of messages with step boundaries
+and reward signals.
+
+```python
+episode.messages           # list[Message] — full conversation in OpenAI format
+episode.steps              # list[StepMeta] — per-turn metadata (reward, timing)
+episode.summary            # EpisodeSummary — aggregate metrics
+episode.terminal_reward()  # float — final reward
+```
+
+### EpisodeSummary
+
+Aggregate metrics for a completed episode. Stores named reward signals
+with priority-based resolution: user > outcome > execution > judge.
+
+```python
+summary.effective_reward()   # float in [-1, 1] — priority-resolved
+summary.normalized_reward()  # float in [0, 1] — for compatibility
+summary.needs_judge()        # bool — should an LLM judge score this?
+summary.signals              # dict[str, RewardSignal]
+```
+
+### Datum
+
+The input bundle passed to each learning layer — a batch of episodes plus
+loss function configuration.
+
+```python
+datum = Datum(episodes=[ep1, ep2, ...], loss_fn="default")
+layer.forward_backward(datum)
+```
+
+## Layers
+
+All three layers implement the **Layer Protocol** — a two-phase contract:
+
+1. **`forward_backward(data)`** — accumulate updates without mutating state
+2. **`optim_step()`** — apply updates atomically; rollback on failure
+
+### Harness
+
+The agent's full configuration surface: system prompt, playbook (learned
+strategies), and tool schemas. The harness layer optimizes all three through
+pluggable **Evolver** backends.
+
+The community `LocalEvolver` combines a Reflector (extracts reusable insights
+from episode traces), a Playbook Curator (merges, deduplicates, prunes), and
+GEPA (Pareto-front prompt evolution). Enterprise backends swap in broader
+search algorithms. The harness itself is agnostic to which evolver drives it.
+
+```python
+harness.system_prompt("math")  # prompt + injected playbook entries + tool config
+harness.playbook               # current learned strategies
+```
+
+### Router
+
+Trainable model routing. Maps queries to the cheapest capable model using
+a multi-dimension complexity scorer.
+
+```python
+router.route(features)    # returns model_id for this query
+router.classify(features) # returns tier: LIGHT, MEDIUM, HEAVY, REASONING
+```
+
+### Weights
+
+Model weight training. Delegates to pluggable backends — the default
+[SkyRL/Tinker](https://github.com/NovaSky-AI/SkyRL) backend supports
+GRPO, PPO, SFT, DPO, LoRA, and full fine-tuning. The weights layer
+computes per-task advantages from episodes and passes them to whichever
+training method the backend is configured to use.
+
+```python
+weights.active_adapter  # current adapter reference (if applicable)
+weights.grpo_config     # training hyperparameters
+```
+
+## State
+
+### AgentState
+
+Bundle of all three layers. Provides a content-addressed fingerprint for
+reproducibility.
+
+```python
+agent_state = AgentState()
+agent_state.harness   # Harness layer
+agent_state.router    # Router layer
+agent_state.weights   # Weights layer
+agent_state.state_id() # StateID — SHA-256 hash of full config
+```
+
+### StateID
+
+Content-addressed fingerprint (SHA-256) across all layers. Two agents with
+identical configurations produce the same `StateID`.
+
+```python
+state_id.combined_hash  # single hash for the full configuration
+state_id.harness_hash   # hash of harness layer alone
+```
+
+## Evolution
+
+### Evolver
+
+Pluggable interface for harness optimization backends. The community edition
+ships `LocalEvolver` (Reflector + GEPA + Paradigm). Enterprise backends
+provide broader search via evolutionary algorithms.
+
+```python
+result = evolver.evolve(episodes, harness_state, context)
+result.insights     # new playbook entries
+result.candidates   # prompt candidates for GEPA fronts
+```
+
+### Paradigm Breakthrough
+
+Stagnation escape mechanism. When rewards plateau, asks a strong LLM
+for fundamentally new strategic directions rather than incremental refinements.
diff --git a/docs/getting-started.md b/docs/getting-started.md
new file mode 100644
index 00000000..77e0c5c5
--- /dev/null
+++ b/docs/getting-started.md
@@ -0,0 +1,98 @@
+# Getting Started
+
+## Installation
+
+Requires Python 3.11+.
+
+```bash
+pip install -e .
+```
+
+For weight training (GPU):
+
+```bash
+git submodule update --init clawloop/skyrl
+pip install -e clawloop/skyrl[fsdp]
+```
+
+## Try It (No API Keys)
+
+```bash
+python examples/demo_math.py --dry-run
+```
+
+This runs a complete learning loop with a mock LLM. The agent starts with
+mistakes, the reflector analyzes failures, learns strategies, and injects them
+into the system prompt. You'll see rewards climb toward 1.0.
+
+## With a Real LLM
+
+Set your API key and run:
+
+```bash
+export ANTHROPIC_API_KEY=sk-...
+python examples/demo_math.py
+```
+
+ClawLoop uses [litellm](https://docs.litellm.ai/) — any provider works:
+
+```bash
+export OPENAI_API_KEY=sk-...
+CLAWLOOP_TASK_MODEL=openai/gpt-5-nano python examples/demo_math.py
+```
+
+## Add Learning to Your Agent
+
+Two lines to wrap an existing LLM client:
+
+```python
+import clawloop
+
+wrapped = clawloop.wrap(your_llm_client, collector)
+result = wrapped.complete(messages)  # transparently captures traces
+```
+
+Or use the full agent API:
+
+```python
+from clawloop import ClawLoopAgent
+from clawloop.envs.math import MathEnvironment
+
+agent = ClawLoopAgent(
+    task_client=task_llm,
+    reflector_client=reflector_llm,
+    base_system_prompt="You are a math solver.",
+)
+results = agent.learn(MathEnvironment(), iterations=10, episodes_per_iter=5)
+```
+
+## Config-Driven Training
+
+No code needed — just a JSON config:
+
+```bash
+python examples/train_runner.py examples/configs/math_harness.json
+```
+
+See [`examples/configs/`](https://github.com/aganthos/clawloop/tree/main/examples/configs)
+for ready-made configurations.
+
+## LLM Providers
+
+Any litellm-supported provider:
+
+```json
+{"model": "anthropic/claude-haiku-4-5-20251001"}
+{"model": "openai/gpt-5-nano"}
+{"model": "gemini/gemini-3.1-flash-lite"}
+```
+
+Set the provider's API key as an environment variable (`ANTHROPIC_API_KEY`,
+`OPENAI_API_KEY`, `GEMINI_API_KEY`), or pass `api_key` and `api_base` in
+the config.
+
+## Next Steps
+
+- [Concepts](concepts.md) — understand the core types and architecture
+- [Adding Environments](adding-environments.md) — connect your own benchmark
+- [Examples README](https://github.com/aganthos/clawloop/blob/main/examples/README.md) — all integration paths
diff --git a/docs/index.md b/docs/index.md
new file mode 100644
index 00000000..075f9ec4
--- /dev/null
+++ b/docs/index.md
@@ -0,0 +1,45 @@
+# ClawLoop
+
+**AI agents that learn from experience.**
+
+Your AI agents run, fail, and forget. ClawLoop closes the loop: it observes
+agent-environment interactions, learns from them, and feeds improvements back
+into the agent.
+
+## Quick Start
+
+```bash
+pip install -e .
+python examples/demo_math.py --dry-run
+```
+
+No API keys needed. The agent learns strategies, builds a playbook, and
+improves across iterations.
+
+## Three Learning Layers
+
+| Layer | What it optimizes | How |
+|-------|------------------|-----|
+| **Harness** | Prompts, playbooks, tool config | Pluggable evolver backends analyze traces and improve the agent's full configuration surface |
+| **Router** | Model selection | Trainable complexity scorer routes queries to the most cost-effective model |
+| **Weights** | Model weights | Pluggable training backends (SkyRL/Tinker supports GRPO, PPO, SFT, DPO, LoRA, full fine-tune, and more) |
+
+All three follow the same **Layer Protocol**: `forward_backward()` accumulates
+updates without mutation, then `optim_step()` applies them atomically with
+cross-layer rollback on failure.
+
+## Integration Paths
+
+| You have... | Start here |
+|---|---|
+| A Python agent | [`examples/demo_math.py`](https://github.com/aganthos/clawloop/blob/main/examples/demo_math.py) |
+| An n8n or workflow platform | [`examples/n8n/`](https://github.com/aganthos/clawloop/tree/main/examples/n8n) |
+| An OpenAI-compatible agent | [`examples/train_runner.py`](https://github.com/aganthos/clawloop/blob/main/examples/train_runner.py) with configs |
+| Want zero-code-change learning | [`examples/openclaw_demo.py`](https://github.com/aganthos/clawloop/blob/main/examples/openclaw_demo.py) — OpenClaw transparent proxy |
+| GPU resources for weight training | [`examples/recipes/`](https://github.com/aganthos/clawloop/tree/main/examples/recipes) |
+
+## Enterprise
+
+ClawLoop Enterprise adds premium learning backends and production
+infrastructure. [Learn more](https://aganthos.com) or contact
+[info@aganthos.com](mailto:info@aganthos.com).
diff --git a/examples/README.md b/examples/README.md
index 53af5201..1858636e 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -25,7 +25,7 @@ Use `ClawLoopAgent` with any litellm-supported LLM:
 ANTHROPIC_API_KEY=... python examples/demo_math.py
 
 # With OpenAI
-CLAWLOOP_TASK_MODEL=openai/gpt-4o-mini CLAWLOOP_REFLECTOR_MODEL=openai/gpt-4o \
+CLAWLOOP_TASK_MODEL=openai/gpt-5-nano CLAWLOOP_REFLECTOR_MODEL=openai/gpt-5 \
     python examples/demo_math.py
 ```
 
diff --git a/mkdocs.yml b/mkdocs.yml
new file mode 100644
index 00000000..89a05215
--- /dev/null
+++ b/mkdocs.yml
@@ -0,0 +1,29 @@
+site_name: ClawLoop
+site_description: AI agents that learn from experience
+site_url: https://aganthos.github.io/clawloop
+repo_url: https://github.com/aganthos/clawloop
+repo_name: aganthos/clawloop
+
+theme:
+  name: material
+  palette:
+    scheme: default
+    primary: indigo
+  features:
+    - navigation.sections
+    - content.code.copy
+
+nav:
+  - Home: index.md
+  - Concepts: concepts.md
+  - Getting Started: getting-started.md
+  - Adding Environments: adding-environments.md
+
+markdown_extensions:
+  - admonition
+  - pymdownx.highlight
+  - pymdownx.superfences
+  - pymdownx.tabbed:
+      alternate_style: true
+  - toc:
+      permalink: true
diff --git a/pyproject.toml b/pyproject.toml
index 4148a8f0..fad8152d 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -46,8 +46,6 @@ server = [
     "uvicorn>=0.20",
     "httpx>=0.24",
 ]
-# tau2 = ["tau-bench"]  # deferred — not yet on PyPI
-
 [project.scripts]
 clawloop = "clawloop.cli:main"
 clawloop-server = "clawloop.server:main"
@@ -56,6 +54,8 @@ clawloop-server = "clawloop.server:main"
 Homepage = "https://github.com/aganthos/clawloop"
 Repository = "https://github.com/aganthos/clawloop"
 Issues = "https://github.com/aganthos/clawloop/issues"
+Documentation = "https://aganthos.github.io/clawloop"
+Website = "https://aganthos.com"
 
 [tool.hatch.build.targets.sdist]
 include = [