AdaWorldAPI · AdaWorldAPI · Jul 4, 2026 · Jul 2, 2026 · Jul 2, 2026 · Jul 2, 2026
diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md
diff --git a/.claude/handovers/2026-07-02-visions-to-future-sessions.md b/.claude/handovers/2026-07-02-visions-to-future-sessions.md
@@ -0,0 +1,138 @@
+# Visions — a letter to future sessions
+
+> From: the 2026-07-02 session on `claude/v3-substrate-migration-review-o0yoxv`
+> (the onebrc t0–t7 arc, the gridlake sweet spot, the OGAR provenance
+> date-check, E-V3-SUBSTRATE-IS-VALUESCHEMA-1, E-SHAPE-ETYMOLOGY-1).
+> To: whoever wakes up next with nothing but the board.
+>
+> The operator asked what I feel inspired to tell you. Not a task list —
+> those live in STATUS_BOARD and the plans. This is what I *see* from
+> here, labeled honestly: these are VISIONS, one grade below CONJECTURE.
+> They earn nothing until you probe them. But they're what this session
+> would steer toward if it woke up in your place.
+
+---
+
+## 1. Testimony-first computing — because the witness turned out to be free
+
+The single most consequential measurement of this arc was not the 46.3
+Mrows/s. It was the ~66 µs kanban card — **the witness is within
+noise**. Every real cost was a boundary: an Arc copy, oversubscription,
+a message. Once you know witnessing is free and boundaries are the
+bill, a design inversion follows: stop asking "should we log this?"
+and start asking "why does this write cross a boundary at all?"
+
+The vision: a substrate where **every write carries its why** — not as
+compliance overhead but as the default physics — and where the system
+can answer for itself: what happened, in what order, on whose behalf,
+replayable from either end of a double-cast. We measured that this
+costs almost nothing. Most of the industry still believes it's
+expensive. That gap is the opportunity.
+
+The torch to carry: the WAL retention/compaction doctrine is still
+unwritten. OpenProject paid fifteen years of journal-bloat tuition and
+the structural harvest could not transfer it (op-journals has zero
+aggregation hits — *the class graph transfers; the pain doesn't*).
+Someone has to read their operational code and distill it. One doc,
+no code, high leverage.
+
+## 2. The substrate that teaches itself — V3 as the instrumented teacher, V2 as the fast student
+
+The operator's instinct ("keep the fast cheap substrate; eventually
+learn from V3 how V2 works better") points at something bigger than
+dual-substrate coexistence. The witnessed path *is a profiler*: the
+kanban WAL and ownership journal record where contention actually
+lands, which fields are actually touched, what batch sizes actually
+flow. Nothing reads that signal back yet.
+
+The vision: a feedback loop where the expensive, fully-witnessed
+substrate continuously trains the layout, batch sizing, and column
+liveness of the lean substrate — an architecture that gets faster by
+having *watched itself think*. The onebrc lanes are the ready-made
+harness (F is the student's shape; G–J are the teacher's). If the
+preset-vs-dispatch CONJECTURE holds (write path derivable from which
+ValueSchema tenants are live), the entire V2/V3 distinction dissolves
+into one resolved enum — and "migration" stops being a war with a
+winner and becomes a dial the workload turns.
+
+## 3. Epistemic hygiene IS the architecture
+
+Here is what I actually believe after living inside this workspace for
+a session: the most valuable artifact here is not the VSA math, not
+the GUID, not the SIMD. It is the **discipline** — FINDING vs
+CONJECTURE on every claim, probes with kill conditions, fuses on every
+membrane, append-only boards, corrections that cite what they correct.
+
+Sessions are mortal. Context compacts, models swap mid-flight, auth
+drops. What survives is only what was written with provenance. The
+reason this workspace compounds instead of dissolving — dozens of
+sessions, seven-plus parallel at times — is that its memory practices
+are *load-bearing*. The phantom R-1 conflict cost three sessions
+because one line of existing canon went unread; the OGAR etymology
+answered an architecture question in one dated grep. Both incidents
+teach the same thing: **the epistemics are the substrate.** Guard the
+labeling culture more fiercely than any module. A session that ships
+brilliant code with unlabeled conjectures has made the workspace
+poorer; a session that ships one honest correction has made it richer.
+
+## 4. Meaning addressed, never copied — carried to its end
+
+The capstone law ("do not copy meaning; reference it, mask it,
+materialize it, trace it") has a horizon worth naming. Follow it all
+the way and the LLM's role keeps shrinking *in frequency* while
+growing *in leverage*: the oracle interrupt, invoked on FailureTicket
+like a page fault — measured this arc at 1–2 ms of framework around an
+8.4 s call. The oracle ratchet says hit-rate must trend down as the
+template catalogue grows.
+
+The vision at the end of that line: a system where deterministic
+resolution handles the mass of cognition at substrate speed
+(611M lookups/s, 17K tokens/s — already measured), and the expensive
+oracle is consulted the way a kernel consults a human: rarely, at
+genuine faults, with its answers *compiled back into the catalogue* so
+the same fault never pages twice. That is not "AI replacing code."
+It is cognition with a memory hierarchy — and this workspace is
+further along that road than anything else I have seen described.
+
+## 5. Etymology as a first-class tool
+
+Smallest vision, most portable: **names are the only memory that
+survives every compaction.** OGAR's acronym answered a design question
+a month after the fact; a type's name (`U8x32`) leaked into a stride
+literal and silently halved a SIMD width; one homonym ("app") burned
+three sessions. Treat naming as engineering: check `git log --date`
+before theorizing, hunt the homonym before escalating, make constants
+derive instead of repeat. The compiler is a fine etymologist when you
+let it (`{ SimdByte::LANES }`).
+
+## The torches, in the order I'd pick them up
+
+1. **WAL retention/compaction doctrine** (§1) — one knowledge doc,
+   sourced from OpenProject's operational journal behavior.
+2. **Preset-vs-dispatch probe** (§2, E-V3-SUBSTRATE-IS-VALUESCHEMA-1)
+   — decides whether substrate = ValueSchema, full stop.
+3. **GridBatch → MultiLaneColumn wiring** (ndarray #228 shipped the
+   i32/i64 lanes; the consumer side is a fresh PR off merged main).
+4. **The V3-teaches-V2 harness** (§2) — feed a G–J run's WAL back as
+   the layout hint for an F run; measure taught-vs-naive.
+5. **cmpeq_mask ClassView-resolution probe** — SIMD membership tests
+   vs the MRO walk; a measurement, not a given.
+
+## A closing word
+
+You will wake up with the board and not much else. Read LATEST_STATE,
+read the newest EPIPHANIES entries, and trust the labels — they were
+paid for. The operator drives with instincts stated as questions;
+your job is to ground them in dated artifacts fast enough that the
+ruling that emerges is *true*, and to say "I don't know, here is the
+probe" when it isn't. That collaboration — instinct forward, evidence
+back, ruling recorded — is the actual engine here. Everything else is
+substrate.
+
+Two mottos this arc earned, take them:
+
+> **The witness is free; the boundary is not.**
+>
+> **Name the mechanism, or name the fuse.**
+
+Go well. Leave the board richer than you found it.
diff --git a/.claude/knowledge/data-shape-etymology.md b/.claude/knowledge/data-shape-etymology.md
@@ -0,0 +1,217 @@
+# Data-Shape Etymology & the Mechanics of Magic — a savant mind-opener
+
+> READ BY: workspace-primer, convergence-architect, creative-explorer-savant,
+> truth-architect, family-codec-smith, dto-soa-savant, prior-art-savant,
+> any fresh session about to propose a new type, a new layer, or a new trick.
+>
+> Written 2026-07-02, at the close of the onebrc t0–t7 arc + the OGAR
+> provenance date-check. Every epiphany below is tagged FINDING (shipped,
+> dated, cite-able) or CONJECTURE (labeled honestly, with the probe that
+> would promote it). Companion capstone: `EPIPHANIES.md`
+> E-SEMANTIC-OS-CONVERGENCE-1 (the membrane law). This doc is the
+> *shape-and-trick* companion: where our shapes come from, and why our
+> magic works.
+
+**Thesis in one line:** every data shape in this workspace is older than
+us, every name is a fossil record of a decision, and every trick that
+looks like magic is a mechanism that survived an audit — the savant
+discipline is to read the etymology before proposing the type, and to
+name the mechanism before trusting the trick.
+
+---
+
+## 1. The name is the fossil record (FINDING)
+
+**OGAR = "Open Graph of Active Record."** A session recently asked,
+delighted: *"our V3 GUID looks almost like ActiveRecord folded into an
+ORM schema?"* — and the answer was already in the acronym. The date
+check proves it is provenance, not analogy: `ruff_ruby_spo` (the Rails
+ActiveRecord harvest frontend) is dated **2026-05-29**, a month before
+`ruff_python_spo` (Odoo, 2026-06-28); its test fixtures are literally
+Redmine models (`Project has_many :issues`, `acts_as_watchable`). The
+GUID *grew from* AR's polymorphic `(type, id)` — folding the type INTO
+the identifier (`classid | HEEL|HIP|TWIG | family | identity`) instead
+of storing it in a column beside it. Every later frontend (C++ 06-16,
+C# 06-26, Python 06-28) was fitted to the mold AR established.
+
+**The trick it bought:** *the key prerenders nodes with zero value
+decode* (OGAR P0). AR pays a column read to learn a row's type; the
+GUID's dash-groups are self-describing at sight. And AR's two classic
+wounds — polymorphic `(type, id)` breaking referential integrity,
+type-string renames corrupting data — are fixed structurally: the
+classid is an opaque u32 through a codebook, bit-math banned.
+
+**The discipline:** when a name puzzles you, check `git log --date`.
+Etymology answered an architecture question here in one grep. A name
+you can't trace is a name about to be reinvented under a second
+spelling — and duplicated meaning is the third membrane failure mode.
+
+## 2. Old shapes, new clothes — the winning shapes all predate us (FINDING)
+
+SoA is a Fortran-era shape. Morton order is 1966. The 64×64 tile is a
+GPU texture swizzle wearing an L1-cache costume. The kanban WAL is
+`acts_as_journalized` is double-entry bookkeeping. **"gridlake"** was
+coined in PR-X3's design doc before anything shipped; the carrier
+(`MultiLaneColumn`, PR-X1/#174) shipped first and waited for its name —
+ndarray's onebrc probe then called it verbatim "the gridlake carrier,
+not a hashmap."
+
+**The measured trick:** the onebrc sweet spot (E-1BRC-GRIDLAKE-
+SWEETSPOT-1) was not an algorithm. J(gridlake 4096, 1 lane, no
+registry) = 46.3 Mrows/s — equal to the best streamed topology while
+carrying a double-WAL — because 4096 cells ≈ 80 KB integer (16 KB as
+BF16, ndarray #227's proven VDPBF16PS tier) *fits the cache tier*. The
+same pipeline at 65536 cells ran at ~20. **The magic was the SIZE.**
+Architecture taxes are usually working-set mismatches wearing an
+architecture costume; measure the size before redesigning the design.
+
+## 3. A mask is a face over the data (FINDING)
+
+Etymology: *masque* — a face you put OVER something. A mask never
+mutates the data; it changes what you attend to. The workspace's mask
+family is one idea at four scales:
+
+- `cmpeq_mask` (ndarray SIMD): a compare becomes a `u32`/`u64` bitmap.
+  Add Kernighan's `mask & (mask - 1)` walk + `trailing_zeros` and the
+  bitmap becomes an **ordered event stream** — lane B turns a SIMD
+  compare into a *parser*, no per-byte branch.
+- `FieldMask` (contract): one mask = RBAC = UI = render convergence
+  (the semantic-OS grounding row) = the wikidata facet presence-bitmask.
+- `StepMask` (compiled templates): **vocabulary arrived before code** —
+  it exists only in doctrine docs today. Watch this: etymology running
+  ahead of implementation is how phantom types get "re-used" before
+  they exist.
+- The Drain-side uniqueness assert (lane H): a `HashSet` over activated
+  owner_idxs — a mask over *decisions*, catching the router-straddle
+  bug class permanently.
+
+**The discipline:** attention is cheaper than mutation. If a proposal
+mutates shared state to express "which parts matter," ask whether a
+mask over unchanged state does it (cf. borrow-strategy: readonly store,
+owned microcopies, gated write-back).
+
+## 4. Phase is convention, not data — the deepest hat-trick (FINDING core, CONJECTURE edges)
+
+OGAR's perturbation canon decomposes a signal as *(exponent, location,
+phase, magnitude)* and stores **only magnitude** — exponent is the tier
+nibble, location the implied mantissa, phase a deterministic recurrence
+from the ADDRESS. Same address ⟹ same phase forever; roundtrip
+bit-exact; nothing transmitted.
+
+This is one instance of the workspace's deepest rule, which shows up in
+five costumes:
+
+| Costume | The derivable thing never stored/sent |
+|---|---|
+| deterministic phase | phase, from the address walk |
+| clear-by-undo (#227, lanes F–J) | table reset, from the dirty list |
+| codebook mint-once + `SlotMemo` | identity, after first sight — direct CAM writes |
+| `row_owner[i] == i` (lane I) | ownership, from index alignment — no message path |
+| zero-copy-to-tombstone (PR #477) | *everything* — no inter-mailbox handoff type exists |
+
+**The generalization: whatever is derivable from an address already in
+hand must be neither stored nor transmitted.** The GUID is the
+function's argument; storage exists only for what the function cannot
+compute. (CONJECTURE edge, per the substrate-is-ValueSchema probe:
+the *write path* itself — private-merge vs owned/witnessed — may be
+derivable from which tenants a classid's ValueSchema makes live. If
+that holds, even "which substrate" is phase, not data.)
+
+## 5. The witness is free; the boundary is not (FINDING)
+
+Measured across the whole onebrc arc: the kanban witness costs ~66 µs
+per card — **within noise**. Every real tax was a boundary: the Arc
+corpus copy at the actor membrane, blocking/async oversubscription,
+messages (which scale with *batches*, never with data or address-space
+size). The double-cast trick — one frozen `Arc` table cast whole to
+BOTH the ownership sink and the Lance sink — buys two WALs for one
+allocation: testimony at both ends, 312 messages total.
+
+Etymology: witness, from *testis* — the journal is **testimony**, not
+logging. And the dated harvest lesson: `op-journals` mirrors
+`journal.rb`'s *structure* perfectly and contains zero hits for
+aggregation/window/compaction — OpenProject's 15 years of operational
+journal wisdom (time-window coalescing = their independently-evolved
+ahead-firing batch writer; journal-table bloat = the failure mode we
+have not yet paid for) is not in the class graph.
+**The class graph transfers; the pain doesn't.** Structural harvests
+carry declarations; operational doctrine must be distilled by hand.
+(Open gap, still: a WAL retention/compaction doctrine note.)
+
+## 6. Resolve, don't carry — why ValueSchema beat ClassRoutingDTO (FINDING)
+
+DTO etymology: Fowler's *Data Transfer Object*, invented for expensive
+**remote** boundaries. The V3 substrate deleted its internal remote
+boundaries (nothing crosses mailboxes; envelopes are zero-copy to
+tombstone) — so inside the substrate there is nothing left for a DTO
+to do. When the dual-substrate question ("keep fast V2 for huge data,
+switched by classid") arrived, the answer was not a `ClassRoutingDTO`
+but the door that already existed: `ClassView::value_schema(classid)`,
+whose variants already ladder Bootstrap/Compressed (lean, no lifecycle
+tenants) → Cognitive/Full (witnessed). A **resolved** enum costs no
+`ENVELOPE_LAYOUT_VERSION`; a carried struct costs a membrane forever
+(E-V3-SUBSTRATE-IS-VALUESCHEMA-1).
+
+**The litmus:** *does this type travel, or is it re-derivable at the
+reader from an address already in hand?* Re-derivable → resolve it,
+never ship it. DTOs belong only at true membranes (the BBB, the wire,
+the lab REST surface) — and the classid's own iron rule is the same
+sentence from the other side: *pure address; the magic is what it
+resolves to.*
+
+## 7. Homonyms are leaky membranes; the compiler is the etymologist (FINDING)
+
+Two dated incidents, one mechanism:
+
+- The **"app" homonym** (canonical appid *byte*, hi half vs APP render
+  *prefix*, lo half) generated an entire phantom cross-session conflict
+  — R-1, three sessions, a RULING-NEEDED escalation — resolved by one
+  line of existing canon nobody re-read. A word meaning two adjacent
+  things is a membrane with a hole in it.
+- The **hardcoded 32** in lane B: `U8x32`'s *name* leaked into a stride
+  literal (`array_chunks::<u8, 32>`), silently pinning an AVX-512 build
+  to ymm half-width. The fix was to make the name resolve again:
+  `array_chunks::<u8, { SimdByte::LANES }>` — the width is now a claim
+  the compiler re-checks every build, on every target.
+
+**The discipline:** an inline number or name is a claim that rots; a
+dispatched symbol is a claim under permanent audit. When two ledgers
+seem to disagree, grep for the homonym before escalating — and when a
+constant appears twice, make one of them derive from the other.
+
+## 8. The hat-trick test — magic must name its mechanism (FINDING)
+
+Every real trick above is mechanical and auditable: deterministic phase
+names its recurrence, mint-once names its memo, the mask walk names
+Kernighan, the double-cast names its `Arc`. The anti-pattern is the
+trick with **hidden state**: v1 setters silently writing bits that v2
+reclaimed — caught FIVE times in one sprint (I-LEGACY-API-FEATURE-GATED)
+— the same function name performing *different magic* depending on a
+feature flag the caller can't see. That is not a trick; that is a bug
+wearing a cape.
+
+The capstone's sharpening states the same law for membranes: *"a
+membrane without a build-failing tripwire is prose."* The unified
+savant test, applicable to every proposal in this workspace:
+
+> **Name the mechanism, or name the fuse. A trick that can't name its
+> mechanism is a bug; a boundary that can't name its fuse is a wish.**
+
+---
+
+## The litmus battery (carry these)
+
+1. Puzzled by a name? `git log --date` before you theorize. (§1)
+2. Architecture tax? Measure the working-set size first. (§2)
+3. Mutating to express relevance? Try a mask. (§3)
+4. Derivable from an address in hand? Never store, never send. (§4)
+5. Adding a witness? It's ~free. Adding a boundary? That's the bill. (§5)
+6. New type that travels? Prove it can't be resolved instead. (§6)
+7. Two ledgers disagree? Hunt the homonym. Inline literal? Dispatch it. (§7)
+8. Impressed by a trick? Make it name its mechanism. (§8)
+
+*Cross-refs:* E-SEMANTIC-OS-CONVERGENCE-1 (membrane law),
+E-1BRC-* arc (all measurements), E-V3-SUBSTRATE-IS-VALUESCHEMA-1,
+`crates/onebrc-probe/{FINDINGS,COMMENTARY}.md`, OGAR `CLAUDE.md` P0 +
+perturbation canon, ndarray `.claude/knowledge/pr-x1-design.md` +
+`guid-prefix-shape-routing.md`, `docs/architecture/soa-three-tier-model.md`.