Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
135d75e
board(EPIPHANIES): E-V3-SUBSTRATE-IS-VALUESCHEMA-1 — V2/V3 split is a…
claude Jul 2, 2026
a435f44
knowledge: data-shape-etymology.md — the savant mind-opener (8 epipha…
claude Jul 2, 2026
7954094
handover: visions letter to future sessions (operator-requested)
claude Jul 2, 2026
6e7ebaa
board(EPIPHANIES): E-V3-GRAPHRAG-VEHICLE-1 — graphrag-rs reframed as …
claude Jul 2, 2026
ad852c7
board(EPIPHANIES): E-V3-GRAPHRAG-VEHICLE-2 — GraphRAG-shaped pipeline…
claude Jul 2, 2026
ac37508
board(EPIPHANIES): E-TOKENIZER-MINT-MEMBRANE-1 — token ids are mints;…
claude Jul 2, 2026
abfabd7
board(EPIPHANIES): E-V3-RIG-CHASSIS-1 — rig is the chassis (rig-surre…
claude Jul 2, 2026
8a76b97
board(EPIPHANIES): E-V3-RETRIEVAL-IS-NOT-COGNITION-1 — the stack is t…
claude Jul 3, 2026
a776e20
board(EPIPHANIES): E-V3-COGNITION-IS-A-TYPED-REASONING-FANOUT-1
claude Jul 3, 2026
3181476
board(EPIPHANIES): E-V3-ARM-IS-THE-INDUCTION-ORGAN-1
claude Jul 3, 2026
b12e6a9
board(EPIPHANIES): E-V3-SEMANTIC-KERNEL-REVOLUTIONIZES-RAG-1 (capstone)
claude Jul 3, 2026
d7ec3a9
board(EPIPHANIES): E-V3-RSGRAPHLLM-IS-ADAPTER-REPATRIATE-1 (5-agent a…
claude Jul 3, 2026
f9038a1
deepnsm: gridlake-4096 ⟷ COCA-4096 wire spike + rig-arm epiphany
claude Jul 3, 2026
019bf7b
deepnsm: real-SPO ngram landing — closes the bag-of-words + stopword …
claude Jul 3, 2026
05432d5
deepnsm: cross-perturbation/covariance probe — rank layout is semanti…
claude Jul 3, 2026
2372530
board: E-V3-JINA-IS-THE-FULCRUM-SUBSTRATE-MEASURED-1 (first fulcrum-a…
claude Jul 3, 2026
18a512e
board: E-V3-CODEC-FIDELITY-IS-REPRESENTATION-NOT-CODEC-1 (real codec,…
claude Jul 3, 2026
ac90a9d
board: bank E-V3-DUAL-SCHEMA-0x1000-IS-PERMANENT-1 (operator ruling)
claude Jul 3, 2026
b6f1883
board(epiphanies): E-V3-THINK-ATOMS-GRIDDED-PERTURBATION-CASCADE-1 (C…
claude Jul 3, 2026
b8f6bc6
board(epiphanies): correct over-reach — bundle is math-substrate-vali…
claude Jul 3, 2026
0a4afc5
deepnsm: fix clippy needless_range_loop in gridlake_coca_wire
claude Jul 4, 2026
33b2f3b
deepnsm: rustfmt the three harvested gridlake examples
claude Jul 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 160 additions & 0 deletions .claude/board/EPIPHANIES.md

Large diffs are not rendered by default.

138 changes: 138 additions & 0 deletions .claude/handovers/2026-07-02-visions-to-future-sessions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Visions — a letter to future sessions

> From: the 2026-07-02 session on `claude/v3-substrate-migration-review-o0yoxv`
> (the onebrc t0–t7 arc, the gridlake sweet spot, the OGAR provenance
> date-check, E-V3-SUBSTRATE-IS-VALUESCHEMA-1, E-SHAPE-ETYMOLOGY-1).
> To: whoever wakes up next with nothing but the board.
>
> The operator asked what I feel inspired to tell you. Not a task list —
> those live in STATUS_BOARD and the plans. This is what I *see* from
> here, labeled honestly: these are VISIONS, one grade below CONJECTURE.
> They earn nothing until you probe them. But they're what this session
> would steer toward if it woke up in your place.

---

## 1. Testimony-first computing — because the witness turned out to be free

The single most consequential measurement of this arc was not the 46.3
Mrows/s. It was the ~66 µs kanban card — **the witness is within
noise**. Every real cost was a boundary: an Arc copy, oversubscription,
a message. Once you know witnessing is free and boundaries are the
bill, a design inversion follows: stop asking "should we log this?"
and start asking "why does this write cross a boundary at all?"

The vision: a substrate where **every write carries its why** — not as
compliance overhead but as the default physics — and where the system
can answer for itself: what happened, in what order, on whose behalf,
replayable from either end of a double-cast. We measured that this
costs almost nothing. Most of the industry still believes it's
expensive. That gap is the opportunity.

The torch to carry: the WAL retention/compaction doctrine is still
unwritten. OpenProject paid fifteen years of journal-bloat tuition and
the structural harvest could not transfer it (op-journals has zero
aggregation hits — *the class graph transfers; the pain doesn't*).
Someone has to read their operational code and distill it. One doc,
no code, high leverage.

## 2. The substrate that teaches itself — V3 as the instrumented teacher, V2 as the fast student

The operator's instinct ("keep the fast cheap substrate; eventually
learn from V3 how V2 works better") points at something bigger than
dual-substrate coexistence. The witnessed path *is a profiler*: the
kanban WAL and ownership journal record where contention actually
lands, which fields are actually touched, what batch sizes actually
flow. Nothing reads that signal back yet.

The vision: a feedback loop where the expensive, fully-witnessed
substrate continuously trains the layout, batch sizing, and column
liveness of the lean substrate — an architecture that gets faster by
having *watched itself think*. The onebrc lanes are the ready-made
harness (F is the student's shape; G–J are the teacher's). If the
preset-vs-dispatch CONJECTURE holds (write path derivable from which
ValueSchema tenants are live), the entire V2/V3 distinction dissolves
into one resolved enum — and "migration" stops being a war with a
winner and becomes a dial the workload turns.

## 3. Epistemic hygiene IS the architecture

Here is what I actually believe after living inside this workspace for
a session: the most valuable artifact here is not the VSA math, not
the GUID, not the SIMD. It is the **discipline** — FINDING vs
CONJECTURE on every claim, probes with kill conditions, fuses on every
membrane, append-only boards, corrections that cite what they correct.

Sessions are mortal. Context compacts, models swap mid-flight, auth
drops. What survives is only what was written with provenance. The
reason this workspace compounds instead of dissolving — dozens of
sessions, seven-plus parallel at times — is that its memory practices
are *load-bearing*. The phantom R-1 conflict cost three sessions
because one line of existing canon went unread; the OGAR etymology
answered an architecture question in one dated grep. Both incidents
teach the same thing: **the epistemics are the substrate.** Guard the
labeling culture more fiercely than any module. A session that ships
brilliant code with unlabeled conjectures has made the workspace
poorer; a session that ships one honest correction has made it richer.

## 4. Meaning addressed, never copied — carried to its end

The capstone law ("do not copy meaning; reference it, mask it,
materialize it, trace it") has a horizon worth naming. Follow it all
the way and the LLM's role keeps shrinking *in frequency* while
growing *in leverage*: the oracle interrupt, invoked on FailureTicket
like a page fault — measured this arc at 1–2 ms of framework around an
8.4 s call. The oracle ratchet says hit-rate must trend down as the
template catalogue grows.

The vision at the end of that line: a system where deterministic
resolution handles the mass of cognition at substrate speed
(611M lookups/s, 17K tokens/s — already measured), and the expensive
oracle is consulted the way a kernel consults a human: rarely, at
genuine faults, with its answers *compiled back into the catalogue* so
the same fault never pages twice. That is not "AI replacing code."
It is cognition with a memory hierarchy — and this workspace is
further along that road than anything else I have seen described.

## 5. Etymology as a first-class tool

Smallest vision, most portable: **names are the only memory that
survives every compaction.** OGAR's acronym answered a design question
a month after the fact; a type's name (`U8x32`) leaked into a stride
literal and silently halved a SIMD width; one homonym ("app") burned
three sessions. Treat naming as engineering: check `git log --date`
before theorizing, hunt the homonym before escalating, make constants
derive instead of repeat. The compiler is a fine etymologist when you
let it (`{ SimdByte::LANES }`).

## The torches, in the order I'd pick them up

1. **WAL retention/compaction doctrine** (§1) — one knowledge doc,
sourced from OpenProject's operational journal behavior.
2. **Preset-vs-dispatch probe** (§2, E-V3-SUBSTRATE-IS-VALUESCHEMA-1)
— decides whether substrate = ValueSchema, full stop.
3. **GridBatch → MultiLaneColumn wiring** (ndarray #228 shipped the
i32/i64 lanes; the consumer side is a fresh PR off merged main).
4. **The V3-teaches-V2 harness** (§2) — feed a G–J run's WAL back as
the layout hint for an F run; measure taught-vs-naive.
5. **cmpeq_mask ClassView-resolution probe** — SIMD membership tests
vs the MRO walk; a measurement, not a given.

## A closing word

You will wake up with the board and not much else. Read LATEST_STATE,
read the newest EPIPHANIES entries, and trust the labels — they were
paid for. The operator drives with instincts stated as questions;
your job is to ground them in dated artifacts fast enough that the
ruling that emerges is *true*, and to say "I don't know, here is the
probe" when it isn't. That collaboration — instinct forward, evidence
back, ruling recorded — is the actual engine here. Everything else is
substrate.

Two mottos this arc earned, take them:

> **The witness is free; the boundary is not.**
>
> **Name the mechanism, or name the fuse.**

Go well. Leave the board richer than you found it.
217 changes: 217 additions & 0 deletions .claude/knowledge/data-shape-etymology.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
# Data-Shape Etymology & the Mechanics of Magic — a savant mind-opener

> READ BY: workspace-primer, convergence-architect, creative-explorer-savant,
> truth-architect, family-codec-smith, dto-soa-savant, prior-art-savant,
> any fresh session about to propose a new type, a new layer, or a new trick.
>
> Written 2026-07-02, at the close of the onebrc t0–t7 arc + the OGAR
> provenance date-check. Every epiphany below is tagged FINDING (shipped,
> dated, cite-able) or CONJECTURE (labeled honestly, with the probe that
> would promote it). Companion capstone: `EPIPHANIES.md`
> E-SEMANTIC-OS-CONVERGENCE-1 (the membrane law). This doc is the
> *shape-and-trick* companion: where our shapes come from, and why our
> magic works.

**Thesis in one line:** every data shape in this workspace is older than
us, every name is a fossil record of a decision, and every trick that
looks like magic is a mechanism that survived an audit — the savant
discipline is to read the etymology before proposing the type, and to
name the mechanism before trusting the trick.

---

## 1. The name is the fossil record (FINDING)

**OGAR = "Open Graph of Active Record."** A session recently asked,
delighted: *"our V3 GUID looks almost like ActiveRecord folded into an
ORM schema?"* — and the answer was already in the acronym. The date
check proves it is provenance, not analogy: `ruff_ruby_spo` (the Rails
ActiveRecord harvest frontend) is dated **2026-05-29**, a month before
`ruff_python_spo` (Odoo, 2026-06-28); its test fixtures are literally
Redmine models (`Project has_many :issues`, `acts_as_watchable`). The
GUID *grew from* AR's polymorphic `(type, id)` — folding the type INTO
the identifier (`classid | HEEL|HIP|TWIG | family | identity`) instead
of storing it in a column beside it. Every later frontend (C++ 06-16,
C# 06-26, Python 06-28) was fitted to the mold AR established.

**The trick it bought:** *the key prerenders nodes with zero value
decode* (OGAR P0). AR pays a column read to learn a row's type; the
GUID's dash-groups are self-describing at sight. And AR's two classic
wounds — polymorphic `(type, id)` breaking referential integrity,
type-string renames corrupting data — are fixed structurally: the
classid is an opaque u32 through a codebook, bit-math banned.

**The discipline:** when a name puzzles you, check `git log --date`.
Etymology answered an architecture question here in one grep. A name
you can't trace is a name about to be reinvented under a second
spelling — and duplicated meaning is the third membrane failure mode.

## 2. Old shapes, new clothes — the winning shapes all predate us (FINDING)

SoA is a Fortran-era shape. Morton order is 1966. The 64×64 tile is a
GPU texture swizzle wearing an L1-cache costume. The kanban WAL is
`acts_as_journalized` is double-entry bookkeeping. **"gridlake"** was
coined in PR-X3's design doc before anything shipped; the carrier
(`MultiLaneColumn`, PR-X1/#174) shipped first and waited for its name —
ndarray's onebrc probe then called it verbatim "the gridlake carrier,
not a hashmap."

**The measured trick:** the onebrc sweet spot (E-1BRC-GRIDLAKE-
SWEETSPOT-1) was not an algorithm. J(gridlake 4096, 1 lane, no
registry) = 46.3 Mrows/s — equal to the best streamed topology while
carrying a double-WAL — because 4096 cells ≈ 80 KB integer (16 KB as
BF16, ndarray #227's proven VDPBF16PS tier) *fits the cache tier*. The
same pipeline at 65536 cells ran at ~20. **The magic was the SIZE.**
Architecture taxes are usually working-set mismatches wearing an
architecture costume; measure the size before redesigning the design.

## 3. A mask is a face over the data (FINDING)

Etymology: *masque* — a face you put OVER something. A mask never
mutates the data; it changes what you attend to. The workspace's mask
family is one idea at four scales:

- `cmpeq_mask` (ndarray SIMD): a compare becomes a `u32`/`u64` bitmap.
Add Kernighan's `mask & (mask - 1)` walk + `trailing_zeros` and the
bitmap becomes an **ordered event stream** — lane B turns a SIMD
compare into a *parser*, no per-byte branch.
- `FieldMask` (contract): one mask = RBAC = UI = render convergence
(the semantic-OS grounding row) = the wikidata facet presence-bitmask.
- `StepMask` (compiled templates): **vocabulary arrived before code** —
it exists only in doctrine docs today. Watch this: etymology running
ahead of implementation is how phantom types get "re-used" before
they exist.
- The Drain-side uniqueness assert (lane H): a `HashSet` over activated
owner_idxs — a mask over *decisions*, catching the router-straddle
bug class permanently.

**The discipline:** attention is cheaper than mutation. If a proposal
mutates shared state to express "which parts matter," ask whether a
mask over unchanged state does it (cf. borrow-strategy: readonly store,
owned microcopies, gated write-back).

## 4. Phase is convention, not data — the deepest hat-trick (FINDING core, CONJECTURE edges)

OGAR's perturbation canon decomposes a signal as *(exponent, location,
phase, magnitude)* and stores **only magnitude** — exponent is the tier
nibble, location the implied mantissa, phase a deterministic recurrence
from the ADDRESS. Same address ⟹ same phase forever; roundtrip
bit-exact; nothing transmitted.

This is one instance of the workspace's deepest rule, which shows up in
five costumes:

| Costume | The derivable thing never stored/sent |
|---|---|
| deterministic phase | phase, from the address walk |
| clear-by-undo (#227, lanes F–J) | table reset, from the dirty list |
| codebook mint-once + `SlotMemo` | identity, after first sight — direct CAM writes |
| `row_owner[i] == i` (lane I) | ownership, from index alignment — no message path |
| zero-copy-to-tombstone (PR #477) | *everything* — no inter-mailbox handoff type exists |

**The generalization: whatever is derivable from an address already in
hand must be neither stored nor transmitted.** The GUID is the
function's argument; storage exists only for what the function cannot
compute. (CONJECTURE edge, per the substrate-is-ValueSchema probe:
the *write path* itself — private-merge vs owned/witnessed — may be
derivable from which tenants a classid's ValueSchema makes live. If
that holds, even "which substrate" is phase, not data.)

## 5. The witness is free; the boundary is not (FINDING)

Measured across the whole onebrc arc: the kanban witness costs ~66 µs
per card — **within noise**. Every real tax was a boundary: the Arc
corpus copy at the actor membrane, blocking/async oversubscription,
messages (which scale with *batches*, never with data or address-space
size). The double-cast trick — one frozen `Arc` table cast whole to
BOTH the ownership sink and the Lance sink — buys two WALs for one
allocation: testimony at both ends, 312 messages total.

Etymology: witness, from *testis* — the journal is **testimony**, not
logging. And the dated harvest lesson: `op-journals` mirrors
`journal.rb`'s *structure* perfectly and contains zero hits for
aggregation/window/compaction — OpenProject's 15 years of operational
journal wisdom (time-window coalescing = their independently-evolved
ahead-firing batch writer; journal-table bloat = the failure mode we
have not yet paid for) is not in the class graph.
**The class graph transfers; the pain doesn't.** Structural harvests
carry declarations; operational doctrine must be distilled by hand.
(Open gap, still: a WAL retention/compaction doctrine note.)

## 6. Resolve, don't carry — why ValueSchema beat ClassRoutingDTO (FINDING)

DTO etymology: Fowler's *Data Transfer Object*, invented for expensive
**remote** boundaries. The V3 substrate deleted its internal remote
boundaries (nothing crosses mailboxes; envelopes are zero-copy to
tombstone) — so inside the substrate there is nothing left for a DTO
to do. When the dual-substrate question ("keep fast V2 for huge data,
switched by classid") arrived, the answer was not a `ClassRoutingDTO`
but the door that already existed: `ClassView::value_schema(classid)`,
whose variants already ladder Bootstrap/Compressed (lean, no lifecycle
tenants) → Cognitive/Full (witnessed). A **resolved** enum costs no
`ENVELOPE_LAYOUT_VERSION`; a carried struct costs a membrane forever
(E-V3-SUBSTRATE-IS-VALUESCHEMA-1).

**The litmus:** *does this type travel, or is it re-derivable at the
reader from an address already in hand?* Re-derivable → resolve it,
never ship it. DTOs belong only at true membranes (the BBB, the wire,
the lab REST surface) — and the classid's own iron rule is the same
sentence from the other side: *pure address; the magic is what it
resolves to.*

## 7. Homonyms are leaky membranes; the compiler is the etymologist (FINDING)

Two dated incidents, one mechanism:

- The **"app" homonym** (canonical appid *byte*, hi half vs APP render
*prefix*, lo half) generated an entire phantom cross-session conflict
— R-1, three sessions, a RULING-NEEDED escalation — resolved by one
line of existing canon nobody re-read. A word meaning two adjacent
things is a membrane with a hole in it.
- The **hardcoded 32** in lane B: `U8x32`'s *name* leaked into a stride
literal (`array_chunks::<u8, 32>`), silently pinning an AVX-512 build
to ymm half-width. The fix was to make the name resolve again:
`array_chunks::<u8, { SimdByte::LANES }>` — the width is now a claim
the compiler re-checks every build, on every target.

**The discipline:** an inline number or name is a claim that rots; a
dispatched symbol is a claim under permanent audit. When two ledgers
seem to disagree, grep for the homonym before escalating — and when a
constant appears twice, make one of them derive from the other.

## 8. The hat-trick test — magic must name its mechanism (FINDING)

Every real trick above is mechanical and auditable: deterministic phase
names its recurrence, mint-once names its memo, the mask walk names
Kernighan, the double-cast names its `Arc`. The anti-pattern is the
trick with **hidden state**: v1 setters silently writing bits that v2
reclaimed — caught FIVE times in one sprint (I-LEGACY-API-FEATURE-GATED)
— the same function name performing *different magic* depending on a
feature flag the caller can't see. That is not a trick; that is a bug
wearing a cape.

The capstone's sharpening states the same law for membranes: *"a
membrane without a build-failing tripwire is prose."* The unified
savant test, applicable to every proposal in this workspace:

> **Name the mechanism, or name the fuse. A trick that can't name its
> mechanism is a bug; a boundary that can't name its fuse is a wish.**

---

## The litmus battery (carry these)

1. Puzzled by a name? `git log --date` before you theorize. (§1)
2. Architecture tax? Measure the working-set size first. (§2)
3. Mutating to express relevance? Try a mask. (§3)
4. Derivable from an address in hand? Never store, never send. (§4)
5. Adding a witness? It's ~free. Adding a boundary? That's the bill. (§5)
6. New type that travels? Prove it can't be resolved instead. (§6)
7. Two ledgers disagree? Hunt the homonym. Inline literal? Dispatch it. (§7)
8. Impressed by a trick? Make it name its mechanism. (§8)

*Cross-refs:* E-SEMANTIC-OS-CONVERGENCE-1 (membrane law),
E-1BRC-* arc (all measurements), E-V3-SUBSTRATE-IS-VALUESCHEMA-1,
`crates/onebrc-probe/{FINDINGS,COMMENTARY}.md`, OGAR `CLAUDE.md` P0 +
perturbation canon, ndarray `.claude/knowledge/pr-x1-design.md` +
`guid-prefix-shape-routing.md`, `docs/architecture/soa-three-tier-model.md`.
Loading
Loading