From 2f1df8d56c6182b9248e7cde10a82373f01e9bd9 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Sat, 4 Jul 2026 11:33:34 +0000
Subject: [PATCH 1/7] contract: transcode the Tesseract recoder load side
 (UnicharCompress)

New zero-dep module lance_graph_contract::unicharcompress -- the load side of
Tesseract's UnicharCompress (ccutil/unicharcompress.{h,cpp}), the LSTM
recognizer's recoded-code <-> unichar-id table. First binary-format leaf: a
little-endian TFile reader (u32 count + per-RecodedCharID
[i8 self_normalized][i32 length][i32*length code]), then ComputeCodeRange
(max+1) and the decode map (last-writer-wins on a shared code). Load side only
(DeSerialize + Encode/Decode/code_range); ComputeEncoding + beam-search maps
are deferred to training/recognizer leaves.

Byte-parity GREEN on real eng.lstm-recoder: encode 112/112 + decode 112/112 +
code_range=111 (examples/recoder_dump.rs {encode,decode} diffed vs a
libtesseract 5.3.4 oracle; the 1012-byte size = 4 + 112*9 was derived before
the parse). Strict where C++ is UB: rejects length > kMaxCodeLen(9) and short
buffers.

+10 unit tests; clippy -D warnings + fmt clean (-p lance-graph-contract).
Board: EPIPHANIES E-CPP-PARITY-7, LATEST_STATE contract inventory. Resolves the
OGAR #148 recoder=0x0802 concept to its content-store module.

Co-Authored-By: Claude <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
---
 .claude/board/EPIPHANIES.md                   |  10 +
 .claude/board/LATEST_STATE.md                 |   6 +
 .../examples/recoder_dump.rs                  |  51 ++
 crates/lance-graph-contract/src/lib.rs        |   1 +
 .../src/unicharcompress.rs                    | 559 ++++++++++++++++++
 5 files changed, 627 insertions(+)
 create mode 100644 crates/lance-graph-contract/examples/recoder_dump.rs
 create mode 100644 crates/lance-graph-contract/src/unicharcompress.rs

diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md
index e3d2c802..7aae6eaf 100644
--- a/.claude/board/EPIPHANIES.md
+++ b/.claude/board/EPIPHANIES.md
@@ -177,6 +177,16 @@ New knowledge doc `.claude/knowledge/data-shape-etymology.md` — the shape-and-
 **Status:** FINDING (operator ruling on the shape — "yes valueschema") + embedded CONJECTURE (the preset-vs-dispatch probe)
 
 Operator floated keeping the fast/cheap V2 substrate for huge data alongside V3, "switched by classid," so V3 can eventually teach V2 how to be better. Resolved: the switch is NOT a new carrier. `ClassView::value_schema(classid) -> ValueSchema` (`canonical_node.rs:894`, `class_view.rs:395`) is ALREADY classid→substrate-shape resolution by trait dispatch — resolved, never stored on-wire (adding a variant costs NO `ENVELOPE_LAYOUT_VERSION` bump), and the four existing variants ALREADY form a substrate ladder: `Bootstrap`(empty, key+edges only) / `Compressed`(cold codec, **no hot lifecycle columns**) / `Cognitive`(hot thinking: Meta+Qualia+Fingerprint+Energy+Plasticity+EntityType) / `Full`(every tenant). So "V2 fast/cheap bulk" = classids that resolve to the LEAN end (Bootstrap/Compressed — no ownership/lifecycle tenants); "V3 witnessed/owned" = Cognitive/Full. **A `ClassRoutingDTO` is rejected:** a DTO is a serialized carried payload, but substrate choice is a RESOLUTION (firewall ADR-022, "contracts compile types, the event never leaves"); and per the three-tier canon nothing crosses mailbox boundaries — every reader re-resolves the substrate from the classid already in the 16-byte key, so there is no boundary for a carrier to travel. `dto-soa-savant` + AGI-as-glove name the new-struct-instead-of-resolution shape exactly. **0x1000 is NOT the switch:** canon fixes it as a temporary adoption MONITOR ("monitor, never a semantic"; retires at P4/100%; MODULE-TABLE flags that a future canon==0x1000 aliases the marker) — substrate routes on the classid's concept-half → ValueSchema, never on the monitor bit. **The deep form (CONJECTURE — PROBE preset-vs-dispatch):** the WRITE PATH may be a pure FUNCTION of the schema — a class whose ValueSchema carries no ownership/lifecycle tenants has nothing for the kanban/WAL to witness, so it naturally collapses to the fast private-merge write; Cognitive/Full carry the tenants that REQUIRE the owned/witnessed path. If that holds, substrate = ValueSchema full stop (no separate `Substrate` enum, no flag). The gate: confirm the write path is derivable from which tenants are live vs needing an independent resolution — evidence base is the onebrc arc itself (lane F private-merge/no-tenants vs lanes G–J owned/witnessed = the two write paths already measured). Open sub-question: whether bulk needs a variant leaner than `Compressed`, or Bootstrap/Compressed already suffice. **"V3 teaches V2" (deferred, needs mechanism):** V3's kanban WAL + ownership journal is the profiling signal (where contention lands, which fields are touched) to optimize the lean V2 layout — the instrumented-teacher / stripped-student loop; no code reads the WAL back into a layout optimizer yet. Net: at most a new `ValueSchema` variant through the existing `value_schema(classid)` door; possibly not even that.
+## 2026-07-04 — E-CPP-PARITY-7 — the UNICHARCOMPRESS (recoder) load side is byte-identical to libtesseract; the seventh leaf, and the FIRST binary-format transcode (`TFile` little-endian)
+**Status:** FINDING (byte-parity proven vs libtesseract 5.3.4; in-contract, tested)
+
+The recoder (`ccutil/unicharcompress.{h,cpp}`) is the LSTM recognizer's code↔id table — the first non-UNICHARSET Core type and the first BINARY leaf (every prior leaf parsed text). `lance_graph_contract::unicharcompress::UnicharCompress` transcodes the load side only (`DeSerialize` → `from_le_bytes`; `EncodeUnichar`/`DecodeUnichar`/`code_range`); byte-parity GREEN on real `/tmp/eng.lstm-recoder` — encode 112/112 + decode 112/112 + code_range=111, via the committed `examples/recoder_dump.rs {encode,decode}` diffed against a libtesseract oracle.
+
+Two firsts + one correction: (1) FIRST binary format — `TFile` LE: `u32 count` + per-`RecodedCharID` `[i8 self_normalized][i32 length][i32×length code]`; the 1012-byte on-disk size = `4 + 112·9` was derived from the format BEFORE the parse (a first-principles pre-registration of correctness). (2) The 5.5.0-header / 5.3.4-lib ABI skew is a NEW object layout not covered by the UNICHARSET bijection: the oracle's `Encode∘Decode` round-trip (1 explained mismatch — ids 1,2 share code 110, last-wins `decode→2`) + `enc_size=112` self-validated the layout. (3) `kMaxCodeLen = 9` — the recoder-plan summary said 3; Hangul/Han USE length-3 but the array is sized 9.
+
+**Pattern holds (E-CPP-KEYSTONE-1).** A new Core type, but the SAME shape: content-store tier (zero-dep, rides the keystone), one `diff` per mode, no Core gap. +10 contract tests. Consumed by `tesseract-core::{Recoder, recoded_to_text}` (codes→decode→ids→`ids_to_text`; +1 boundary test, 8/8). The recoder keystone (`invoke_recoder`, the E-CPP-KEYSTONE-1 analog) is UNBLOCKED — OGAR #148 minted concept `recoder`=0x0802 (mirrored in `ogar_codebook`) — but DEFERRED: the `classid→ClassView→content` dispatch is already proven generically, so a recoder keystone would re-prove a pattern with no new byte-parity information.
+
+Routing re-verified LIVE against OGAR (not the plan's cached answers): SURREAL-AST-TRAP-PREFLIGHT 5Q (data-shaped table, zero lifecycle vocabulary → content-store is honest) + OGAR-AS-IR §3 (adds no `Class`/`ActionDef`/`KausalSpec` → rerouted to the content tier, NOT `emit_rust`). ndarray and `ruff_cpp_spo` were correctly NOT used: the recoder is zero-SIMD data, and `UnicharCompress`/`RecodedCharID` have no inheritance/vtable for the harvest to resolve. Cross-ref: `E-CPP-PARITY-1..6`, `E-CPP-KEYSTONE-1`, `.claude/knowledge/core-first-transcode-doctrine.md`, OGAR #148 (0x08 OCR mint). Branch `claude/happy-hamilton-0azlw4`, lance-graph + tesseract-rs.
 ## 2026-07-02 — E-1BRC-GRIDLAKE-SWEETSPOT-1: the 64×64 gridlake SoA is the measured sweet spot — the batch pipeline at tile scale equals the best streamed topology while carrying the double-WAL
 **Status:** FINDING (measured, onebrc-probe lane J t7; closes the operator's four follow-up questions and the t4→t7 kanban-update arc)
 
diff --git a/.claude/board/LATEST_STATE.md b/.claude/board/LATEST_STATE.md
index d90788f9..353f1672 100644
--- a/.claude/board/LATEST_STATE.md
+++ b/.claude/board/LATEST_STATE.md
@@ -10,6 +10,12 @@
 
 ---
 
+## 2026-07-04 — branch `claude/happy-hamilton-0azlw4` — `contract::unicharcompress` — the Tesseract recoder load side (byte-parity vs libtesseract)
+
+**NEW** `lance_graph_contract::unicharcompress`: `UnicharCompress` (the LSTM recoder's code↔id table) + `RecodedCharId` + `RecoderError`, load side only (`from_le_bytes` / `load_from_file` = C++ `DeSerialize`; `encode` / `decode` / `code_range`; `dump_encode` / `dump_decode` parity surfaces). The FIRST binary-format leaf (`TFile` little-endian: `u32 count` + per-entry `[i8 self_normalized][i32 length][i32×length code]`). Byte-parity **GREEN** on real `/tmp/eng.lstm-recoder` — encode 112/112 + decode 112/112 + code_range=111 — via the committed `examples/recoder_dump.rs`, diffed vs a libtesseract 5.3.4 oracle (the 5.5.0-header ABI skew self-validated by the `Encode∘Decode` round-trip + `enc_size=112`). +10 contract tests; `-p lance-graph-contract` clippy `-D warnings` + fmt clean. Consumed by `tesseract-core::{Recoder, recoded_to_text}` (codes→decode→ids→`ids_to_text`; +1 boundary test, 8/8). Resolves the `recoder`=0x0802 concept (OGAR #148 mint, mirrored in the "0x08XX OCR rows" line below) to its content-store module. The recoder keystone (`invoke_recoder`) is UNBLOCKED but deferred (dispatch already proven generically by E-CPP-KEYSTONE-1). Refs: EPIPHANIES `E-CPP-PARITY-7`. Not yet a PR.
+
+---
+
 ## 2026-06-23 — IN PR (`claude/medcare-bridge-lance-graph-wmx76z`) — ActionHandler⟷RBAC⟷orchestration spine
 
 `contract::rbac`: `ScopeSpec` (axis-3 Copy token) + `ClassRbac` §4 default methods (`roles_reaching`/`row_scope`/`field_mask`; backward-compat, probe green). `contract::class_view::FieldMask::union`. `contract::action::ActionInvocation::commit_via<R: ClassRbac>` (no-admin-bypass convergence of the inline gate). `lance-graph-rbac::{authorize_scoped, ScopedDecision}` (§5 two-stage). `lance-graph-ogar::{OgarRbac<S: GrantSource>, GrantSource}` (Q5 local newtype, §6 evaporation seam). rs-graph-llm: `graph-flow-kanban::{run_cycle, CycleOutcome}` + `graph-flow-action::dispatch_via`. Plan: integration-actionhandler-rbac-orchestration-v1.
diff --git a/crates/lance-graph-contract/examples/recoder_dump.rs b/crates/lance-graph-contract/examples/recoder_dump.rs
new file mode 100644
index 00000000..1f2eed26
--- /dev/null
+++ b/crates/lance-graph-contract/examples/recoder_dump.rs
@@ -0,0 +1,51 @@
+//! Dump a `.lstm-recoder`'s encoder table (`encode`) or decode round-trip
+//! (`decode`) — the Rust side of the recoder byte-parity leaf, sibling to
+//! `unicharset_dump`.
+//!
+//! ```sh
+//! # on a box with libtesseract + libleptonica installed:
+//! combine_tessdata -u $(dpkg -L tesseract-ocr-eng | grep eng.traineddata) /tmp/eng.
+//! # C++ oracle (recoder_oracle.cpp): loads the SAME component via TFile and dumps
+//! # EncodeUnichar / DecodeUnichar / code_range. It also prints, per id, the
+//! # UNICHARSET bijection + an Encode.Decode round-trip so the NEW UnicharCompress
+//! # object layout self-validates against the 5.5.0-header / 5.3.4-lib ABI skew.
+//! #   ./recoder_oracle /tmp/eng.lstm-unicharset /tmp/eng.lstm-recoder encode > /tmp/oracle_recoder_encode.tsv
+//! #   ./recoder_oracle /tmp/eng.lstm-unicharset /tmp/eng.lstm-recoder decode > /tmp/oracle_recoder_decode.tsv
+//! # Rust side:
+//! cargo run -p lance-graph-contract --example recoder_dump -- /tmp/eng.lstm-recoder encode > /tmp/rust_recoder_encode.tsv
+//! cargo run -p lance-graph-contract --example recoder_dump -- /tmp/eng.lstm-recoder decode > /tmp/rust_recoder_decode.tsv
+//! diff /tmp/oracle_recoder_encode.tsv /tmp/rust_recoder_encode.tsv \
+//!   && diff /tmp/oracle_recoder_decode.tsv /tmp/rust_recoder_decode.tsv
+//! # both byte-identical => the recoder load-side is byte-parity green
+//! ```
+
+#![allow(
+    clippy::print_stdout,
+    reason = "a dump CLI example writes to stdout by design"
+)]
+
+use std::path::Path;
+use std::process::ExitCode;
+
+use lance_graph_contract::unicharcompress::UnicharCompress;
+
+fn main() -> ExitCode {
+    let Some(path) = std::env::args().nth(1) else {
+        eprintln!("usage: recoder_dump <path/to/eng.lstm-recoder> [encode|decode]");
+        return ExitCode::FAILURE;
+    };
+    let mode = std::env::args().nth(2).unwrap_or_default();
+    match UnicharCompress::load_from_file(Path::new(&path)) {
+        Ok(recoder) => {
+            match mode.as_str() {
+                "decode" => print!("{}", recoder.dump_decode()),
+                _ => print!("{}", recoder.dump_encode()),
+            }
+            ExitCode::SUCCESS
+        }
+        Err(err) => {
+            eprintln!("error: {err}");
+            ExitCode::FAILURE
+        }
+    }
+}
diff --git a/crates/lance-graph-contract/src/lib.rs b/crates/lance-graph-contract/src/lib.rs
index 12b9101c..f206e7aa 100644
--- a/crates/lance-graph-contract/src/lib.rs
+++ b/crates/lance-graph-contract/src/lib.rs
@@ -133,6 +133,7 @@ pub mod tax;
 pub mod tenant_counter;
 pub mod thinking;
 pub mod unichar;
+pub mod unicharcompress;
 pub mod unicharset;
 pub mod unicharset_adapter;
 pub mod view_angle;
diff --git a/crates/lance-graph-contract/src/unicharcompress.rs b/crates/lance-graph-contract/src/unicharcompress.rs
new file mode 100644
index 00000000..a2e32c3f
--- /dev/null
+++ b/crates/lance-graph-contract/src/unicharcompress.rs
@@ -0,0 +1,559 @@
+//! `UNICHARCOMPRESS` (the recoder) content store — the Rust side of the recoder
+//! byte-parity leaf, sibling to [`crate::unicharset`].
+//!
+//! Tesseract's `UnicharCompress` (`ccutil/unicharcompress.{h,cpp}`) re-encodes
+//! each unichar-id as a short sequence of small codes (Han radical-stroke,
+//! Hangul Jamo, ligature dissection; pass-through for simple scripts). The LSTM
+//! recognizer's output lattice speaks these **recoded codes, not raw
+//! unichar-ids**, so `ids_to_text` only becomes real OCR output once the decode
+//! table exists. Per the Core-First doctrine this is a **classid-keyed
+//! content-store tier** (a loaded codec table — id ↔ code-sequence bijection +
+//! bounds), exactly like [`crate::unicharset::UniCharSet`]: data-shaped, no
+//! lifecycle vocabulary, no effects. It rides the existing keystone; it is NOT
+//! IR-surface (`docs/OGAR-AS-IR.md` §3: adds no `Class` field, no `ActionDef`,
+//! no `KausalSpec` slot).
+//!
+//! # Load-side scope
+//!
+//! This module transcodes the **load side only** — `DeSerialize` +
+//! `EncodeUnichar` + `DecodeUnichar` + `code_range` (the recognizer runtime
+//! surface). `ComputeEncoding` (the training-side table builder) is out of
+//! scope. `SetupDecoder`'s beam-search maps (`is_valid_start_` / `next_codes_` /
+//! `final_codes_`, `unicharcompress.cpp:396-434`) are the recognizer's, not the
+//! decode table's — they are deferred to the recognizer leaf; only the
+//! `decoder_` map (code → id) is built here.
+//!
+//! # Binary format (byte-parity surface)
+//!
+//! Every prior leaf parsed text; the recoder is **binary** (`serialis.h` `TFile`
+//! conventions). `UnicharCompress::Serialize` writes exactly the `encoder_`
+//! vector (`unicharcompress.cpp:318-320`, comment `unicharcompress.h:229`: "the
+//! only part that is serialized. The rest is computed on load"). The wire form
+//! (little-endian; `TFile::swap_ == false` on x86) is:
+//!
+//! ```text
+//! u32  count                         // TFile::DeSerialize(vector<T>), serialis.h:90
+//! count × RecodedCharID:
+//!   i8   self_normalized             // RecodedCharID::DeSerialize, unicharcompress.h:75
+//!   i32  length                      // number of codes in use (<= kMaxCodeLen=9)
+//!   i32 × length  code               // only `length` codes are written, not all 9
+//! ```
+//!
+//! For real `eng.lstm-recoder` (112 pass-through entries, all length-1):
+//! `4 + 112·(1+4+4) = 1012` bytes — the exact on-disk size, a first-principles
+//! pre-registration of a correct parse. On load, `ComputeCodeRange`
+//! (`unicharcompress.cpp:383`, `max(code)+1`) and the `decoder_` map
+//! (`unicharcompress.cpp:400-402`, `decoder_[code]=id` in ascending-id order, so
+//! **last writer wins** on a shared code) are recomputed.
+//!
+//! [`UnicharCompress::dump_encode`] / [`UnicharCompress::dump_decode`] are the
+//! byte-parity surfaces, diffed against the C++ `UnicharCompress` oracle
+//! (`recoder_oracle.cpp`, which links libtesseract, loads the same component via
+//! `TFile`, and dumps `EncodeUnichar` / `DecodeUnichar` / `code_range`). The
+//! oracle's `Encode∘Decode` round-trip + the `UNICHARSET` bijection guard the
+//! 5.5.0-header / 5.3.4-lib ABI skew for this NEW object layout.
+//!
+//! # Strict-vs-lenient
+//!
+//! C++ `RecodedCharID::DeSerialize` reads `length` then reads that many `i32`
+//! into the fixed `code_[9]` — a buffer overflow (UB) if `length > 9` on hostile
+//! input. This reader instead rejects `length < 0 || length > kMaxCodeLen`
+//! ([`RecoderError::BadCodeLength`]) and a truncated buffer
+//! ([`RecoderError::UnexpectedEof`]). On well-formed trained data (`length` is
+//! always 1..=3) the byte-parity diff is unaffected; the guard only fires on
+//! corruption.
+
+use std::collections::HashMap;
+use std::hash::{Hash, Hasher};
+use std::path::Path;
+
+/// `RecodedCharID::kMaxCodeLen` (tesseract `unicharcompress.h:35`) — the fixed
+/// capacity of a code array. Hangul/Han use length 3; the array is sized 9.
+const K_MAX_CODE_LEN: usize = 9;
+
+/// The C++ `INVALID_UNICHAR_ID` sentinel (tesseract `unichar.h`) — what
+/// [`UnicharCompress::decode`] returns for a code with no matching id, mirroring
+/// `DecodeUnichar` (`unicharcompress.cpp:305-315`).
+const INVALID_UNICHAR_ID: i32 = -1;
+
+/// The `TFile::DeSerialize(vector<T>)` sanity cap (tesseract `serialis.h:96`):
+/// a declared element count above this is treated as corrupt input.
+const MAX_ELEMENTS: u32 = 50_000_000;
+
+/// The code sequence for one recoded unichar-id — the transcription of
+/// tesseract's `RecodedCharID` (`unicharcompress.h:32-109`).
+///
+/// Equality and hashing mirror the C++ `operator==` / `RecodedCharIDHash`
+/// (`unicharcompress.h:79-99`): **only `length` + the used `code[0..length]`
+/// participate**; `self_normalized` and any trailing array slots are ignored, so
+/// this is a sound [`HashMap`] key for the decoder (`decoder_[code]`).
+#[derive(Debug, Clone)]
+pub struct RecodedCharId {
+    /// True (`1`) if this is the master entry for ids sharing one code; stored as
+    /// `i8` for serialization (`unicharcompress.h:104`). Preserved on load for
+    /// round-trip fidelity; not part of identity.
+    self_normalized: i8,
+    /// The number of codes in use in `code` (`unicharcompress.h:106`).
+    length: i32,
+    /// The re-encoded form (`unicharcompress.h:108`). Only `code[0..length]` is
+    /// meaningful; trailing slots are `0`.
+    code: [i32; K_MAX_CODE_LEN],
+}
+
+impl Default for RecodedCharId {
+    /// Mirrors the C++ default ctor (`unicharcompress.h:37`): `self_normalized =
+    /// 1`, `length = 0`, all codes `0`.
+    fn default() -> Self {
+        Self {
+            self_normalized: 1,
+            length: 0,
+            code: [0; K_MAX_CODE_LEN],
+        }
+    }
+}
+
+impl RecodedCharId {
+    /// The codes in use — `code[0..length]`. The only bytes that carry identity.
+    #[must_use]
+    pub fn codes(&self) -> &[i32] {
+        let len = self.length.max(0) as usize;
+        // `length` is bounded to `<= K_MAX_CODE_LEN` at load; `min` keeps this
+        // total even for a hand-built value.
+        &self.code[..len.min(K_MAX_CODE_LEN)]
+    }
+
+    /// The number of codes in use (the C++ `length()`, `unicharcompress.h:62`).
+    #[must_use]
+    pub fn length(&self) -> i32 {
+        self.length
+    }
+
+    /// Whether this code is empty (`length == 0`), the C++ `empty()`
+    /// (`unicharcompress.h:58`).
+    #[must_use]
+    pub fn is_empty(&self) -> bool {
+        self.length == 0
+    }
+
+    /// Whether this is the self-normalizing master entry (`unicharcompress.h:104`).
+    #[must_use]
+    pub fn self_normalized(&self) -> bool {
+        self.self_normalized != 0
+    }
+
+    /// Read one `RecodedCharID` from the little-endian cursor. Rejects a
+    /// `length` outside `0..=kMaxCodeLen` (the C++ UB guard) and a short buffer.
+    fn read(r: &mut ByteReader<'_>) -> Result<Self, RecoderError> {
+        let self_normalized = r.read_i8()?;
+        let length = r.read_i32()?;
+        if length < 0 || length as usize > K_MAX_CODE_LEN {
+            return Err(RecoderError::BadCodeLength(length));
+        }
+        let mut code = [0_i32; K_MAX_CODE_LEN];
+        for slot in code.iter_mut().take(length as usize) {
+            *slot = r.read_i32()?;
+        }
+        Ok(Self {
+            self_normalized,
+            length,
+            code,
+        })
+    }
+}
+
+impl PartialEq for RecodedCharId {
+    /// `operator==` (`unicharcompress.h:79-89`): compares `length` +
+    /// `code[0..length]` only.
+    fn eq(&self, other: &Self) -> bool {
+        self.codes() == other.codes()
+    }
+}
+
+impl Eq for RecodedCharId {}
+
+impl Hash for RecodedCharId {
+    /// Consistent with [`PartialEq`]: hash the used codes only. (The C++
+    /// `RecodedCharIDHash` folds the same `code[0..length]`; the Rust hasher need
+    /// only agree with `eq`, not reproduce the C++ bit-mix.)
+    fn hash<H: Hasher>(&self, state: &mut H) {
+        self.codes().hash(state);
+    }
+}
+
+/// A loaded `UnicharCompress` (the recoder): the `encoder_` table (id → codes),
+/// its inverse `decoder_` (codes → id), and `code_range` — the transcription of
+/// tesseract's `UnicharCompress` load side (`unicharcompress.{h,cpp}`).
+#[derive(Debug, Clone, Default)]
+pub struct UnicharCompress {
+    /// id → code sequence (index IS the unichar-id). The only serialized part
+    /// (`unicharcompress.h:229-230`).
+    encoder: Vec<RecodedCharId>,
+    /// code → unichar-id, recomputed on load (`SetupDecoder`,
+    /// `unicharcompress.cpp:400-402`). Last-writer-wins on a shared code.
+    decoder: HashMap<RecodedCharId, u32>,
+    /// `1 + max code value` (`ComputeCodeRange`, `unicharcompress.cpp:383-393`);
+    /// the lattice width. `0` for an empty encoder (`-1 + 1`).
+    code_range: i32,
+}
+
+impl UnicharCompress {
+    /// Load a recoder from the raw little-endian bytes of a `.lstm-recoder`
+    /// component (the C++ `DeSerialize`, `unicharcompress.cpp:323-330`): read the
+    /// `encoder_` vector, then recompute `code_range` and the decode map.
+    ///
+    /// # Errors
+    ///
+    /// [`RecoderError::UnexpectedEof`] on a truncated buffer,
+    /// [`RecoderError::TooManyElements`] if the declared count exceeds the
+    /// `serialis.h` sanity cap, and [`RecoderError::BadCodeLength`] if any entry
+    /// declares a code length outside `0..=9`.
+    pub fn from_le_bytes(bytes: &[u8]) -> Result<Self, RecoderError> {
+        let mut r = ByteReader::new(bytes);
+        let count = r.read_u32()?;
+        if count > MAX_ELEMENTS {
+            return Err(RecoderError::TooManyElements(count));
+        }
+        let mut encoder = Vec::with_capacity(count as usize);
+        for _ in 0..count {
+            encoder.push(RecodedCharId::read(&mut r)?);
+        }
+        // Trailing bytes are ignored on purpose: a component extracted from a
+        // TFile stream may be followed by the next component's bytes (the C++
+        // reader leaves the cursor for them). A standalone `.lstm-recoder` is
+        // consumed exactly.
+        let mut this = Self {
+            encoder,
+            decoder: HashMap::new(),
+            code_range: 0,
+        };
+        this.compute_code_range();
+        this.setup_decoder();
+        Ok(this)
+    }
+
+    /// Load a recoder from a `.lstm-recoder` file (a thin wrapper over
+    /// [`Self::from_le_bytes`]). Extract one via
+    /// `combine_tessdata -u eng.traineddata /tmp/eng.`.
+    ///
+    /// # Errors
+    ///
+    /// [`RecoderError::Io`] if the file cannot be read, else the parse errors of
+    /// [`Self::from_le_bytes`].
+    pub fn load_from_file(path: &Path) -> Result<Self, RecoderError> {
+        let bytes = std::fs::read(path).map_err(|e| RecoderError::Io(e.to_string()))?;
+        Self::from_le_bytes(&bytes)
+    }
+
+    /// `1 + max code value` — the lattice width (`code_range`,
+    /// `unicharcompress.h:171`).
+    #[must_use]
+    pub fn code_range(&self) -> i32 {
+        self.code_range
+    }
+
+    /// The number of encoded unichar-ids (`encoder_.size()`).
+    #[must_use]
+    pub fn len(&self) -> usize {
+        self.encoder.len()
+    }
+
+    /// Whether the encoder is empty.
+    #[must_use]
+    pub fn is_empty(&self) -> bool {
+        self.encoder.is_empty()
+    }
+
+    /// The code sequence for `unichar_id`, or `None` if out of range — the C++
+    /// `EncodeUnichar` (`unicharcompress.cpp:295-301`; a `None` here is the C++
+    /// return of length `0`).
+    #[must_use]
+    pub fn encode(&self, unichar_id: u32) -> Option<&RecodedCharId> {
+        self.encoder.get(unichar_id as usize)
+    }
+
+    /// The unichar-id for `code`, or [`INVALID_UNICHAR_ID`] (`-1`) if the code is
+    /// ill-formed or unknown — the C++ `DecodeUnichar`
+    /// (`unicharcompress.cpp:305-315`).
+    #[must_use]
+    pub fn decode(&self, code: &RecodedCharId) -> i32 {
+        let len = code.length();
+        if len <= 0 || len as usize > K_MAX_CODE_LEN {
+            return INVALID_UNICHAR_ID;
+        }
+        self.decoder
+            .get(code)
+            .map_or(INVALID_UNICHAR_ID, |&id| id as i32)
+    }
+
+    /// `ComputeCodeRange` (`unicharcompress.cpp:383-393`): `code_range = 1 + max`
+    /// code value over every position of every entry (`0` for an empty encoder).
+    fn compute_code_range(&mut self) {
+        let mut max = -1_i32;
+        for entry in &self.encoder {
+            for &c in entry.codes() {
+                if c > max {
+                    max = c;
+                }
+            }
+        }
+        self.code_range = max + 1;
+    }
+
+    /// The decode-map half of `SetupDecoder` (`unicharcompress.cpp:400-402`):
+    /// `decoder_[encoder_[id]] = id` in ascending id order, so **last writer
+    /// wins** when two ids share a code. The beam-search maps are the
+    /// recognizer's and are not built here (see module docs).
+    fn setup_decoder(&mut self) {
+        self.decoder.clear();
+        self.decoder.reserve(self.encoder.len());
+        for (id, code) in self.encoder.iter().enumerate() {
+            self.decoder.insert(code.clone(), id as u32);
+        }
+    }
+
+    /// Render the id→code table as `"<id>\t<len>\t<c0>[,<c1>...]\n"` lines — the
+    /// exact shape the C++ recoder oracle's `encode` mode prints, so the
+    /// byte-parity diff is `diff oracle_recoder_encode.tsv rust_recoder_encode.tsv`.
+    #[must_use]
+    pub fn dump_encode(&self) -> String {
+        let mut out = String::new();
+        for (id, entry) in self.encoder.iter().enumerate() {
+            out.push_str(&id.to_string());
+            out.push('\t');
+            out.push_str(&entry.length().to_string());
+            out.push('\t');
+            for (i, &c) in entry.codes().iter().enumerate() {
+                if i > 0 {
+                    out.push(',');
+                }
+                out.push_str(&c.to_string());
+            }
+            out.push('\n');
+        }
+        out
+    }
+
+    /// Render `"code_range\t<N>\n"` then `"<id>\t<decoded>\n"` lines (where
+    /// `decoded = decode(encode(id))`) — the exact shape the C++ recoder oracle's
+    /// `decode` mode prints, so the byte-parity diff is
+    /// `diff oracle_recoder_decode.tsv rust_recoder_decode.tsv`. On a shared code
+    /// the decoded id is the last-writer, matching the C++ map.
+    #[must_use]
+    pub fn dump_decode(&self) -> String {
+        let mut out = String::new();
+        out.push_str("code_range\t");
+        out.push_str(&self.code_range.to_string());
+        out.push('\n');
+        for (id, entry) in self.encoder.iter().enumerate() {
+            out.push_str(&id.to_string());
+            out.push('\t');
+            out.push_str(&self.decode(entry).to_string());
+            out.push('\n');
+        }
+        out
+    }
+}
+
+/// A little-endian byte cursor over the recoder component — the reader half of
+/// the `TFile` primitives this leaf needs (`FReadEndian` with `swap_ == false`).
+struct ByteReader<'a> {
+    bytes: &'a [u8],
+    pos: usize,
+}
+
+impl<'a> ByteReader<'a> {
+    fn new(bytes: &'a [u8]) -> Self {
+        Self { bytes, pos: 0 }
+    }
+
+    /// Advance over `n` bytes, or [`RecoderError::UnexpectedEof`] if short.
+    fn take(&mut self, n: usize) -> Result<&'a [u8], RecoderError> {
+        let end = self.pos.checked_add(n).ok_or(RecoderError::UnexpectedEof)?;
+        let slice = self
+            .bytes
+            .get(self.pos..end)
+            .ok_or(RecoderError::UnexpectedEof)?;
+        self.pos = end;
+        Ok(slice)
+    }
+
+    fn read_i8(&mut self) -> Result<i8, RecoderError> {
+        Ok(self.take(1)?[0] as i8)
+    }
+
+    fn read_u32(&mut self) -> Result<u32, RecoderError> {
+        let arr: [u8; 4] = self
+            .take(4)?
+            .try_into()
+            .map_err(|_| RecoderError::UnexpectedEof)?;
+        Ok(u32::from_le_bytes(arr))
+    }
+
+    fn read_i32(&mut self) -> Result<i32, RecoderError> {
+        let arr: [u8; 4] = self
+            .take(4)?
+            .try_into()
+            .map_err(|_| RecoderError::UnexpectedEof)?;
+        Ok(i32::from_le_bytes(arr))
+    }
+}
+
+/// A failure loading a `UnicharCompress` (recoder).
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub enum RecoderError {
+    /// The buffer ended mid-field.
+    UnexpectedEof,
+    /// The declared element count exceeded the `serialis.h` sanity cap.
+    TooManyElements(u32),
+    /// A `RecodedCharID` declared a code length outside `0..=9` (the C++ fixed
+    /// array capacity `kMaxCodeLen`).
+    BadCodeLength(i32),
+    /// The file could not be read (message from the underlying I/O error).
+    Io(String),
+}
+
+impl std::fmt::Display for RecoderError {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        match self {
+            Self::UnexpectedEof => write!(f, "recoder buffer ended mid-field"),
+            Self::TooManyElements(n) => {
+                write!(
+                    f,
+                    "recoder declared {n} elements (over the {MAX_ELEMENTS} cap)"
+                )
+            }
+            Self::BadCodeLength(len) => {
+                write!(
+                    f,
+                    "recoded code length {len} out of range 0..={K_MAX_CODE_LEN}"
+                )
+            }
+            Self::Io(msg) => write!(f, "recoder read failed: {msg}"),
+        }
+    }
+}
+
+impl std::error::Error for RecoderError {}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    /// Build a `.lstm-recoder` byte buffer from `(self_normalized, codes)`
+    /// entries, in the exact little-endian wire form the C++ `Serialize` writes.
+    fn build(entries: &[(i8, &[i32])]) -> Vec<u8> {
+        let mut b = Vec::new();
+        b.extend_from_slice(&u32::try_from(entries.len()).unwrap().to_le_bytes());
+        for (self_norm, codes) in entries {
+            b.push(*self_norm as u8);
+            b.extend_from_slice(&i32::try_from(codes.len()).unwrap().to_le_bytes());
+            for &c in *codes {
+                b.extend_from_slice(&c.to_le_bytes());
+            }
+        }
+        b
+    }
+
+    #[test]
+    fn parses_count_and_entries() {
+        let bytes = build(&[(1, &[0]), (1, &[5]), (1, &[5])]);
+        let rec = UnicharCompress::from_le_bytes(&bytes).expect("valid");
+        assert_eq!(rec.len(), 3);
+        assert_eq!(rec.encode(0).unwrap().codes(), &[0]);
+        assert_eq!(rec.encode(2).unwrap().codes(), &[5]);
+        assert!(rec.encode(3).is_none(), "out-of-range id -> None");
+    }
+
+    #[test]
+    fn code_range_is_max_plus_one() {
+        // max code value 5 -> code_range 6.
+        let rec = UnicharCompress::from_le_bytes(&build(&[(1, &[0]), (1, &[5]), (1, &[3])]))
+            .expect("valid");
+        assert_eq!(rec.code_range(), 6);
+        // Empty encoder -> -1 + 1 = 0 (matches ComputeCodeRange's seed).
+        let empty = UnicharCompress::from_le_bytes(&build(&[])).expect("valid");
+        assert_eq!(empty.code_range(), 0);
+    }
+
+    #[test]
+    fn decode_is_last_writer_wins_on_shared_code() {
+        // ids 1 and 2 both encode to code [5]; decoder keeps the last (id 2) —
+        // exactly the eng.lstm-recoder id1/id2 -> code 110 case.
+        let rec = UnicharCompress::from_le_bytes(&build(&[(1, &[0]), (1, &[5]), (1, &[5])]))
+            .expect("valid");
+        assert_eq!(rec.decode(rec.encode(0).unwrap()), 0);
+        assert_eq!(
+            rec.decode(rec.encode(1).unwrap()),
+            2,
+            "shared code -> last id"
+        );
+        assert_eq!(rec.decode(rec.encode(2).unwrap()), 2);
+    }
+
+    #[test]
+    fn decode_unknown_or_illformed_is_invalid() {
+        let rec = UnicharCompress::from_le_bytes(&build(&[(1, &[0])])).expect("valid");
+        // An empty code (length 0) is ill-formed for decode.
+        assert_eq!(rec.decode(&RecodedCharId::default()), INVALID_UNICHAR_ID);
+    }
+
+    #[test]
+    fn equality_ignores_self_normalized_and_trailing() {
+        // Same code, different self_normalized -> equal (C++ operator==).
+        let a = UnicharCompress::from_le_bytes(&build(&[(1, &[7])])).expect("valid");
+        let b = UnicharCompress::from_le_bytes(&build(&[(0, &[7])])).expect("valid");
+        assert_eq!(a.encode(0).unwrap(), b.encode(0).unwrap());
+    }
+
+    #[test]
+    fn dump_encode_matches_oracle_shape() {
+        // A multi-code entry exercises the comma join.
+        let rec = UnicharCompress::from_le_bytes(&build(&[(1, &[0]), (1, &[5]), (1, &[1, 2, 3])]))
+            .expect("valid");
+        assert_eq!(rec.dump_encode(), "0\t1\t0\n1\t1\t5\n2\t3\t1,2,3\n");
+    }
+
+    #[test]
+    fn dump_decode_matches_oracle_shape() {
+        let rec = UnicharCompress::from_le_bytes(&build(&[(1, &[0]), (1, &[5]), (1, &[5])]))
+            .expect("valid");
+        // code_range = 6; id1 decodes to 2 (last-writer on shared code [5]).
+        assert_eq!(rec.dump_decode(), "code_range\t6\n0\t0\n1\t2\n2\t2\n");
+    }
+
+    #[test]
+    fn truncated_buffer_errors() {
+        let mut bytes = build(&[(1, &[0])]);
+        bytes.pop(); // drop the last code byte
+        assert_eq!(
+            UnicharCompress::from_le_bytes(&bytes).unwrap_err(),
+            RecoderError::UnexpectedEof
+        );
+        // A count with no entries at all.
+        assert_eq!(
+            UnicharCompress::from_le_bytes(&[3, 0, 0, 0]).unwrap_err(),
+            RecoderError::UnexpectedEof
+        );
+    }
+
+    #[test]
+    fn bad_code_length_errors() {
+        // count=1, self_norm=1, length=10 (> kMaxCodeLen) — the C++ UB case.
+        let mut bytes = vec![1, 0, 0, 0, 1];
+        bytes.extend_from_slice(&10_i32.to_le_bytes());
+        assert_eq!(
+            UnicharCompress::from_le_bytes(&bytes).unwrap_err(),
+            RecoderError::BadCodeLength(10)
+        );
+    }
+
+    #[test]
+    fn too_many_elements_errors() {
+        // A declared count over the cap fails fast without allocating.
+        let bytes = (MAX_ELEMENTS + 1).to_le_bytes();
+        assert_eq!(
+            UnicharCompress::from_le_bytes(&bytes).unwrap_err(),
+            RecoderError::TooManyElements(MAX_ELEMENTS + 1)
+        );
+    }
+}

From ba5ce72f9a91aabad6f60643fa81f684d76bab20 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Sat, 4 Jul 2026 12:45:42 +0000
Subject: [PATCH 2/7] board: E-OCR-COMPUTE-NDARRAY-SEAM-1 -- recognizer int8
 path maps 1:1 onto ndarray

CONJECTURE (design-pass finding; byte-parity probe = recognizer Leaf 1). The
OCR recognizer is COMPUTE (dense int8 GEMM), not content -- it consumes
ndarray's existing matmul_i8_to_i32 / quantize / dequantize with no Core gap.
int8->i32 is exact + bit-reproducible across AMX/VNNI/scalar. Corrects the
"OCR is ndarray-free" framing. Cross-ref E-CPP-PARITY-7, the recognizer plan.

Co-Authored-By: Claude <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
---
 .claude/board/EPIPHANIES.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md
index 7aae6eaf..21344c47 100644
--- a/.claude/board/EPIPHANIES.md
+++ b/.claude/board/EPIPHANIES.md
@@ -177,6 +177,14 @@ New knowledge doc `.claude/knowledge/data-shape-etymology.md` — the shape-and-
 **Status:** FINDING (operator ruling on the shape — "yes valueschema") + embedded CONJECTURE (the preset-vs-dispatch probe)
 
 Operator floated keeping the fast/cheap V2 substrate for huge data alongside V3, "switched by classid," so V3 can eventually teach V2 how to be better. Resolved: the switch is NOT a new carrier. `ClassView::value_schema(classid) -> ValueSchema` (`canonical_node.rs:894`, `class_view.rs:395`) is ALREADY classid→substrate-shape resolution by trait dispatch — resolved, never stored on-wire (adding a variant costs NO `ENVELOPE_LAYOUT_VERSION` bump), and the four existing variants ALREADY form a substrate ladder: `Bootstrap`(empty, key+edges only) / `Compressed`(cold codec, **no hot lifecycle columns**) / `Cognitive`(hot thinking: Meta+Qualia+Fingerprint+Energy+Plasticity+EntityType) / `Full`(every tenant). So "V2 fast/cheap bulk" = classids that resolve to the LEAN end (Bootstrap/Compressed — no ownership/lifecycle tenants); "V3 witnessed/owned" = Cognitive/Full. **A `ClassRoutingDTO` is rejected:** a DTO is a serialized carried payload, but substrate choice is a RESOLUTION (firewall ADR-022, "contracts compile types, the event never leaves"); and per the three-tier canon nothing crosses mailbox boundaries — every reader re-resolves the substrate from the classid already in the 16-byte key, so there is no boundary for a carrier to travel. `dto-soa-savant` + AGI-as-glove name the new-struct-instead-of-resolution shape exactly. **0x1000 is NOT the switch:** canon fixes it as a temporary adoption MONITOR ("monitor, never a semantic"; retires at P4/100%; MODULE-TABLE flags that a future canon==0x1000 aliases the marker) — substrate routes on the classid's concept-half → ValueSchema, never on the monitor bit. **The deep form (CONJECTURE — PROBE preset-vs-dispatch):** the WRITE PATH may be a pure FUNCTION of the schema — a class whose ValueSchema carries no ownership/lifecycle tenants has nothing for the kanban/WAL to witness, so it naturally collapses to the fast private-merge write; Cognitive/Full carry the tenants that REQUIRE the owned/witnessed path. If that holds, substrate = ValueSchema full stop (no separate `Substrate` enum, no flag). The gate: confirm the write path is derivable from which tenants are live vs needing an independent resolution — evidence base is the onebrc arc itself (lane F private-merge/no-tenants vs lanes G–J owned/witnessed = the two write paths already measured). Open sub-question: whether bulk needs a variant leaner than `Compressed`, or Bootstrap/Compressed already suffice. **"V3 teaches V2" (deferred, needs mechanism):** V3's kanban WAL + ownership journal is the profiling signal (where contention lands, which fields are touched) to optimize the lean V2 layout — the instrumented-teacher / stripped-student loop; no code reads the WAL back into a layout optimizer yet. Net: at most a new `ValueSchema` variant through the existing `value_schema(classid)` door; possibly not even that.
+## 2026-07-04 — E-OCR-COMPUTE-NDARRAY-SEAM-1 — the OCR recognizer's int8 hot path maps 1:1 onto ndarray's existing `matmul_i8_to_i32`; no Core gap, and int8→i32 is bit-reproducible across every SIMD tier
+**Status:** CONJECTURE (design-pass finding; the byte-parity probe is recognizer Leaf 1, not yet run). Corrects the earlier "OCR transcode is ndarray-free" framing (operator sanity check: *OCR without hardware acceleration isn't smart*).
+
+The recoder/unicharset leaves were codec TABLES (correctly zero-dep content tier); the RECOGNIZER is COMPUTE — Tesseract's LSTM forward pass is dense int8 GEMM (`IntSimdMatrix::MatrixDotVector`, `WeightMatrix`; `src/arch` + `src/lstm`). Surveyed 2026-07-04 against ndarray master: it maps ONE-TO-ONE onto primitives ndarray ALREADY ships — `IntSimdMatrix::MatrixDotVector` (int8 W × int8 u → i32) ↔ `simd_runtime::matmul_i8_to_i32` (AMX TDPBUSD → VPDPBUSD → scalar); `WeightMatrix::ConvertToInt` (row max-abs → INT8_MAX + per-row float scale) ↔ `simd_amx::quantize_energy_i8`; the scale-back ↔ `dequantize_result_f64`. **No Core gap** — the recognizer CONSUMES ndarray's GEMM (the `simd-savant` "all SIMD from `ndarray::simd`" invariant), never re-transcodes SIMD.
+
+Two load-bearing properties: (1) int8×int8→i32 accumulation is EXACT + order-independent, so AMX / AVX512-VNNI / AVX2-VNNI / scalar all yield the IDENTICAL i32 — the recognizer's integer matmul is **bit-reproducible across every SIMD tier** (unlike float/BF16 GEMM), which is what makes byte-parity clean. (2) The base `MatrixDotVector` bias is `w(i,num_in)·INT8_MAX` NOT `·1` (intsimdmatrix.cpp:101 — the input's imaginary `1.0` is int8-quantized to 127); `TFloat` = `double` unless `FAST_FLOAT` (tesstypes.h) → the float half is a Leaf-1 probe, the i32 half is exact.
+
+Two-foundations architecture (the correction): `lance-graph-contract` = CONTENT (codec tables, zero-dep) · `ndarray` = COMPUTE (int8/bf16 SIMD GEMM, already shipped) · `tesseract-core` = content consumer (zero-dep) · NEW `tesseract-recognizer` = compute consumer (deps `ndarray` + `tesseract-core`). Plan: `tesseract-rs/.claude/plans/recognizer-core-shape-v1.md` (Leaf 1 = `MatrixDotVector` byte-parity on synthetic int8, no `Pix`). Cross-ref: `E-CPP-PARITY-7` (recoder), `.claude/knowledge/core-first-transcode-doctrine.md`, ndarray `vertical-simd-consumer-contract.md`. Branch `claude/happy-hamilton-0azlw4`.
 ## 2026-07-04 — E-CPP-PARITY-7 — the UNICHARCOMPRESS (recoder) load side is byte-identical to libtesseract; the seventh leaf, and the FIRST binary-format transcode (`TFile` little-endian)
 **Status:** FINDING (byte-parity proven vs libtesseract 5.3.4; in-contract, tested)
 

From 856358a273b3629fd04799c8b20aa6ccb6055a99 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Sat, 4 Jul 2026 13:12:32 +0000
Subject: [PATCH 3/7] board: E-OCR-MATDOTVEC-1 -- recognizer Leaf 1 byte-parity
 green (promotes seam FINDING)

The int8 MatrixDotVector, via ndarray's matmul_i8_to_i32, equals libtesseract
exactly on synthetic int8 (integer-combined diff, TFloat-agnostic). Promotes
E-OCR-COMPUTE-NDARRAY-SEAM-1 CONJECTURE->FINDING. New crate tesseract-recognizer
(compute tier). in-env libtesseract is FAST_FLOAT.

Co-Authored-By: Claude <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
---
 .claude/board/EPIPHANIES.md | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md
index 21344c47..58967f0a 100644
--- a/.claude/board/EPIPHANIES.md
+++ b/.claude/board/EPIPHANIES.md
@@ -177,8 +177,16 @@ New knowledge doc `.claude/knowledge/data-shape-etymology.md` — the shape-and-
 **Status:** FINDING (operator ruling on the shape — "yes valueschema") + embedded CONJECTURE (the preset-vs-dispatch probe)
 
 Operator floated keeping the fast/cheap V2 substrate for huge data alongside V3, "switched by classid," so V3 can eventually teach V2 how to be better. Resolved: the switch is NOT a new carrier. `ClassView::value_schema(classid) -> ValueSchema` (`canonical_node.rs:894`, `class_view.rs:395`) is ALREADY classid→substrate-shape resolution by trait dispatch — resolved, never stored on-wire (adding a variant costs NO `ENVELOPE_LAYOUT_VERSION` bump), and the four existing variants ALREADY form a substrate ladder: `Bootstrap`(empty, key+edges only) / `Compressed`(cold codec, **no hot lifecycle columns**) / `Cognitive`(hot thinking: Meta+Qualia+Fingerprint+Energy+Plasticity+EntityType) / `Full`(every tenant). So "V2 fast/cheap bulk" = classids that resolve to the LEAN end (Bootstrap/Compressed — no ownership/lifecycle tenants); "V3 witnessed/owned" = Cognitive/Full. **A `ClassRoutingDTO` is rejected:** a DTO is a serialized carried payload, but substrate choice is a RESOLUTION (firewall ADR-022, "contracts compile types, the event never leaves"); and per the three-tier canon nothing crosses mailbox boundaries — every reader re-resolves the substrate from the classid already in the 16-byte key, so there is no boundary for a carrier to travel. `dto-soa-savant` + AGI-as-glove name the new-struct-instead-of-resolution shape exactly. **0x1000 is NOT the switch:** canon fixes it as a temporary adoption MONITOR ("monitor, never a semantic"; retires at P4/100%; MODULE-TABLE flags that a future canon==0x1000 aliases the marker) — substrate routes on the classid's concept-half → ValueSchema, never on the monitor bit. **The deep form (CONJECTURE — PROBE preset-vs-dispatch):** the WRITE PATH may be a pure FUNCTION of the schema — a class whose ValueSchema carries no ownership/lifecycle tenants has nothing for the kanban/WAL to witness, so it naturally collapses to the fast private-merge write; Cognitive/Full carry the tenants that REQUIRE the owned/witnessed path. If that holds, substrate = ValueSchema full stop (no separate `Substrate` enum, no flag). The gate: confirm the write path is derivable from which tenants are live vs needing an independent resolution — evidence base is the onebrc arc itself (lane F private-merge/no-tenants vs lanes G–J owned/witnessed = the two write paths already measured). Open sub-question: whether bulk needs a variant leaner than `Compressed`, or Bootstrap/Compressed already suffice. **"V3 teaches V2" (deferred, needs mechanism):** V3's kanban WAL + ownership journal is the profiling signal (where contention lands, which fields are touched) to optimize the lean V2 layout — the instrumented-teacher / stripped-student loop; no code reads the WAL back into a layout optimizer yet. Net: at most a new `ValueSchema` variant through the existing `value_schema(classid)` door; possibly not even that.
+## 2026-07-04 — E-OCR-MATDOTVEC-1 — recognizer Leaf 1 is byte-parity green: the int8 `MatrixDotVector`, via ndarray's `matmul_i8_to_i32`, equals libtesseract exactly (promotes `E-OCR-COMPUTE-NDARRAY-SEAM-1` CONJECTURE→FINDING)
+**Status:** FINDING (byte-parity proven vs libtesseract 5.3.4; new crate `tesseract-recognizer`, tested)
+
+The recognizer's first COMPUTE leaf ships. `tesseract-recognizer::matrix_dot_vector` transcodes Tesseract's base `IntSimdMatrix::MatrixDotVector` (intsimdmatrix.cpp:78-117) by **consuming** `ndarray::simd_runtime::matmul_i8_to_i32` (AMX `TDPBUSD` → `VPDPBUSD` → scalar) — the bias falls out of one matmul by padding the input with a trailing `INT8_MAX` (127), the int8 quantization of the imaginary `1.0` bias. Byte-parity GREEN on synthetic int8 across two shapes (48×49, 7×5) vs a libtesseract oracle, diffing the EXACT INTEGER combined value (`Σ w·u + w_bias·127`, scales=1.0, exact in float) so the diff is `TFloat`-agnostic. +4 unit tests; clippy `-D warnings` + fmt clean (`-p tesseract-recognizer`, scoped).
+
+Two plan unknowns resolved: (1) `matmul_i8_to_i32` is behind ndarray's `runtime-dispatch` feature (stable, NOT nightly); the cold ndarray compile is only ~36 s on 1.95. (2) libtesseract 5.3.4 in-env is **FAST_FLOAT → `TFloat = float`** (self-check `sizeof(TFloat)=4`, `lib=157585 hand=157585`): the ABI probe was the DOUBLE-signature link error, then the FLOAT rebuild self-validated. The integer accumulate (the transcode's core) is exact; the scaled float is an adapter float-type choice (f64, documented for a later leaf).
+
+Toolchain: operator policy "always bump to 1.95" cleared ndarray's `rust-version = 1.95` manifest gate (env was 1.94.1; 1.95.0 set default). CI updated to sibling-checkout ndarray + a 1.95 step. The two-foundations architecture is now REAL: `tesseract-recognizer` (deps ndarray) = compute tier next to `tesseract-core` (deps lance-graph-contract) = content tier. Plan: `tesseract-rs/.claude/plans/recognizer-core-shape-v1.md` (Leaf 1 EXECUTED; next = `WeightMatrix::DeSerialize` + the network graph → `recodebeam` → the code lattice `recoded_to_text` eats). Cross-ref: `E-OCR-COMPUTE-NDARRAY-SEAM-1` (now FINDING), `E-CPP-PARITY-7`. Branch `claude/happy-hamilton-0azlw4`.
 ## 2026-07-04 — E-OCR-COMPUTE-NDARRAY-SEAM-1 — the OCR recognizer's int8 hot path maps 1:1 onto ndarray's existing `matmul_i8_to_i32`; no Core gap, and int8→i32 is bit-reproducible across every SIMD tier
-**Status:** CONJECTURE (design-pass finding; the byte-parity probe is recognizer Leaf 1, not yet run). Corrects the earlier "OCR transcode is ndarray-free" framing (operator sanity check: *OCR without hardware acceleration isn't smart*).
+**Status:** FINDING (2026-07-04 — byte-parity proven by recognizer Leaf 1, `E-OCR-MATDOTVEC-1`; was CONJECTURE at design-pass time). Corrects the earlier "OCR transcode is ndarray-free" framing (operator sanity check: *OCR without hardware acceleration isn't smart*).
 
 The recoder/unicharset leaves were codec TABLES (correctly zero-dep content tier); the RECOGNIZER is COMPUTE — Tesseract's LSTM forward pass is dense int8 GEMM (`IntSimdMatrix::MatrixDotVector`, `WeightMatrix`; `src/arch` + `src/lstm`). Surveyed 2026-07-04 against ndarray master: it maps ONE-TO-ONE onto primitives ndarray ALREADY ships — `IntSimdMatrix::MatrixDotVector` (int8 W × int8 u → i32) ↔ `simd_runtime::matmul_i8_to_i32` (AMX TDPBUSD → VPDPBUSD → scalar); `WeightMatrix::ConvertToInt` (row max-abs → INT8_MAX + per-row float scale) ↔ `simd_amx::quantize_energy_i8`; the scale-back ↔ `dequantize_result_f64`. **No Core gap** — the recognizer CONSUMES ndarray's GEMM (the `simd-savant` "all SIMD from `ndarray::simd`" invariant), never re-transcodes SIMD.
 

From 4af9162d53064334d39ff0d3c8f4c8feb7877ca3 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Sat, 4 Jul 2026 13:34:08 +0000
Subject: [PATCH 4/7] board: E-OCR-WEIGHTMATRIX-1 -- recognizer Leaf 2
 byte-parity green

WeightMatrix::DeSerialize (int mode) transcoded + byte-parity vs libtesseract
(f32 bit-patterns, two shapes). forward() chains Leaf 1's proven int8 GEMM,
scaling in f32 to match FAST_FLOAT. Rust-writes / lib-reads independent proof.

Co-Authored-By: Claude <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
---
 .claude/board/EPIPHANIES.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md
index 58967f0a..0e7fa213 100644
--- a/.claude/board/EPIPHANIES.md
+++ b/.claude/board/EPIPHANIES.md
@@ -177,6 +177,14 @@ New knowledge doc `.claude/knowledge/data-shape-etymology.md` — the shape-and-
 **Status:** FINDING (operator ruling on the shape — "yes valueschema") + embedded CONJECTURE (the preset-vs-dispatch probe)
 
 Operator floated keeping the fast/cheap V2 substrate for huge data alongside V3, "switched by classid," so V3 can eventually teach V2 how to be better. Resolved: the switch is NOT a new carrier. `ClassView::value_schema(classid) -> ValueSchema` (`canonical_node.rs:894`, `class_view.rs:395`) is ALREADY classid→substrate-shape resolution by trait dispatch — resolved, never stored on-wire (adding a variant costs NO `ENVELOPE_LAYOUT_VERSION` bump), and the four existing variants ALREADY form a substrate ladder: `Bootstrap`(empty, key+edges only) / `Compressed`(cold codec, **no hot lifecycle columns**) / `Cognitive`(hot thinking: Meta+Qualia+Fingerprint+Energy+Plasticity+EntityType) / `Full`(every tenant). So "V2 fast/cheap bulk" = classids that resolve to the LEAN end (Bootstrap/Compressed — no ownership/lifecycle tenants); "V3 witnessed/owned" = Cognitive/Full. **A `ClassRoutingDTO` is rejected:** a DTO is a serialized carried payload, but substrate choice is a RESOLUTION (firewall ADR-022, "contracts compile types, the event never leaves"); and per the three-tier canon nothing crosses mailbox boundaries — every reader re-resolves the substrate from the classid already in the 16-byte key, so there is no boundary for a carrier to travel. `dto-soa-savant` + AGI-as-glove name the new-struct-instead-of-resolution shape exactly. **0x1000 is NOT the switch:** canon fixes it as a temporary adoption MONITOR ("monitor, never a semantic"; retires at P4/100%; MODULE-TABLE flags that a future canon==0x1000 aliases the marker) — substrate routes on the classid's concept-half → ValueSchema, never on the monitor bit. **The deep form (CONJECTURE — PROBE preset-vs-dispatch):** the WRITE PATH may be a pure FUNCTION of the schema — a class whose ValueSchema carries no ownership/lifecycle tenants has nothing for the kanban/WAL to witness, so it naturally collapses to the fast private-merge write; Cognitive/Full carry the tenants that REQUIRE the owned/witnessed path. If that holds, substrate = ValueSchema full stop (no separate `Substrate` enum, no flag). The gate: confirm the write path is derivable from which tenants are live vs needing an independent resolution — evidence base is the onebrc arc itself (lane F private-merge/no-tenants vs lanes G–J owned/witnessed = the two write paths already measured). Open sub-question: whether bulk needs a variant leaner than `Compressed`, or Bootstrap/Compressed already suffice. **"V3 teaches V2" (deferred, needs mechanism):** V3's kanban WAL + ownership journal is the profiling signal (where contention lands, which fields are touched) to optimize the lean V2 layout — the instrumented-teacher / stripped-student loop; no code reads the WAL back into a layout optimizer yet. Net: at most a new `ValueSchema` variant through the existing `value_schema(classid)` door; possibly not even that.
+## 2026-07-04 — E-OCR-WEIGHTMATRIX-1 — recognizer Leaf 2: `WeightMatrix::DeSerialize` (int mode) is byte-parity green vs libtesseract; the forward chains Leaf 1's proven int8 GEMM
+**Status:** FINDING (byte-parity proven vs libtesseract 5.3.4; `tesseract-recognizer`, tested)
+
+The recognizer's second leaf loads the int-mode `WeightMatrix`. `tesseract_recognizer::WeightMatrix::from_le_bytes` transcodes `WeightMatrix::DeSerialize` (weightmatrix.cpp:280-320, int-mode arm): the little-endian `TFile` layout `u8 mode(0x81) | wi_[GENERIC_2D_ARRAY<int8>: u32 dim1, u32 dim2, i8 empty_, dim1·dim2 i8] | u32 num_scales | num_scales × f64 (=scale·127)`. `forward()` runs the int8 forward by consuming the byte-parity-proven `matrix_dot_vector_i32` (Leaf 1) then scaling in **f32** to match Tesseract's FAST_FLOAT build.
+
+Byte-parity GREEN vs a libtesseract oracle on two shapes (8×5, 24×17), comparing **f32 bit-patterns** exactly. Proof design: Rust WRITES the serialized bytes, libtesseract READS them via the REAL `DeSerialize` + `MatrixDotVector` — a wrong wire layout would make the real parser diverge, so the diff is an independent proof (no `InitWeightsFloat`/`TRand` oracle-build needed). +5 unit tests (hand-built bytes); clippy `-D warnings` + fmt clean.
+
+Three source-only format details captured: `mode` always carries `kDoubleFlag` (its absence = old float layout → `UnsupportedFormat`); the `empty_` fill byte sits BETWEEN the dims and the data; **scales are doubles on disk regardless of `FAST_FLOAT`** (weightmatrix.cpp:257, loaded `/INT8_MAX`). Plus: `Init` may pad `scales_` past `num_out` (SIMD layout) — the loader keeps only the first `num_out`; and the SIMD `MatrixDotVector` OVER-READS the input to `RoundInputs` padding (the oracle zero-pads `u`). Next Leaf 3+: the network graph forward (`Series`/`LSTM`/`FullyConnected`/`Convolve`) → `recodebeam` → the code lattice `recoded_to_text` eats. Cross-ref: `E-OCR-MATDOTVEC-1` (Leaf 1), `E-OCR-COMPUTE-NDARRAY-SEAM-1`. Plan: `recognizer-core-shape-v1.md` (Leaf 2 EXECUTED). Branch `claude/happy-hamilton-0azlw4`.
 ## 2026-07-04 — E-OCR-MATDOTVEC-1 — recognizer Leaf 1 is byte-parity green: the int8 `MatrixDotVector`, via ndarray's `matmul_i8_to_i32`, equals libtesseract exactly (promotes `E-OCR-COMPUTE-NDARRAY-SEAM-1` CONJECTURE→FINDING)
 **Status:** FINDING (byte-parity proven vs libtesseract 5.3.4; new crate `tesseract-recognizer`, tested)
 

From c60d8f55a1c8bc9ccf27b1279e34ef43d2857644 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Sat, 4 Jul 2026 13:41:02 +0000
Subject: [PATCH 5/7] board: E-OCR-ACTIVATION-1 -- recognizer Leaf 3
 byte-parity green

The LUT activations (Tanh/Logistic + Relu/Clip/Softmax) transcoded + byte-parity
vs libtesseract on a 4096-pt sweep; the regenerated tables match the baked ones.
All f32 (FAST_FLOAT). Leaf 2 + Leaf 3 = the pieces of a FullyConnected forward.

Co-Authored-By: Claude <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
---
 .claude/board/EPIPHANIES.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md
index 0e7fa213..8c35bfbb 100644
--- a/.claude/board/EPIPHANIES.md
+++ b/.claude/board/EPIPHANIES.md
@@ -177,6 +177,14 @@ New knowledge doc `.claude/knowledge/data-shape-etymology.md` — the shape-and-
 **Status:** FINDING (operator ruling on the shape — "yes valueschema") + embedded CONJECTURE (the preset-vs-dispatch probe)
 
 Operator floated keeping the fast/cheap V2 substrate for huge data alongside V3, "switched by classid," so V3 can eventually teach V2 how to be better. Resolved: the switch is NOT a new carrier. `ClassView::value_schema(classid) -> ValueSchema` (`canonical_node.rs:894`, `class_view.rs:395`) is ALREADY classid→substrate-shape resolution by trait dispatch — resolved, never stored on-wire (adding a variant costs NO `ENVELOPE_LAYOUT_VERSION` bump), and the four existing variants ALREADY form a substrate ladder: `Bootstrap`(empty, key+edges only) / `Compressed`(cold codec, **no hot lifecycle columns**) / `Cognitive`(hot thinking: Meta+Qualia+Fingerprint+Energy+Plasticity+EntityType) / `Full`(every tenant). So "V2 fast/cheap bulk" = classids that resolve to the LEAN end (Bootstrap/Compressed — no ownership/lifecycle tenants); "V3 witnessed/owned" = Cognitive/Full. **A `ClassRoutingDTO` is rejected:** a DTO is a serialized carried payload, but substrate choice is a RESOLUTION (firewall ADR-022, "contracts compile types, the event never leaves"); and per the three-tier canon nothing crosses mailbox boundaries — every reader re-resolves the substrate from the classid already in the 16-byte key, so there is no boundary for a carrier to travel. `dto-soa-savant` + AGI-as-glove name the new-struct-instead-of-resolution shape exactly. **0x1000 is NOT the switch:** canon fixes it as a temporary adoption MONITOR ("monitor, never a semantic"; retires at P4/100%; MODULE-TABLE flags that a future canon==0x1000 aliases the marker) — substrate routes on the classid's concept-half → ValueSchema, never on the monitor bit. **The deep form (CONJECTURE — PROBE preset-vs-dispatch):** the WRITE PATH may be a pure FUNCTION of the schema — a class whose ValueSchema carries no ownership/lifecycle tenants has nothing for the kanban/WAL to witness, so it naturally collapses to the fast private-merge write; Cognitive/Full carry the tenants that REQUIRE the owned/witnessed path. If that holds, substrate = ValueSchema full stop (no separate `Substrate` enum, no flag). The gate: confirm the write path is derivable from which tenants are live vs needing an independent resolution — evidence base is the onebrc arc itself (lane F private-merge/no-tenants vs lanes G–J owned/witnessed = the two write paths already measured). Open sub-question: whether bulk needs a variant leaner than `Compressed`, or Bootstrap/Compressed already suffice. **"V3 teaches V2" (deferred, needs mechanism):** V3's kanban WAL + ownership journal is the profiling signal (where contention lands, which fields are touched) to optimize the lean V2 layout — the instrumented-teacher / stripped-student loop; no code reads the WAL back into a layout optimizer yet. Net: at most a new `ValueSchema` variant through the existing `value_schema(classid)` door; possibly not even that.
+## 2026-07-04 — E-OCR-ACTIVATION-1 — recognizer Leaf 3: the LUT activations (Tanh/Logistic + Relu/Clip/Softmax) are byte-parity green vs libtesseract; the regenerated tables match the baked ones
+**Status:** FINDING (byte-parity proven vs libtesseract 5.3.4; `tesseract-recognizer`, tested)
+
+The recognizer's activation non-linearities (lstm/functions.h). `tesseract_recognizer::activation::{tanh, logistic}` transcode the 4096-entry LUT sigmoids (`kScaleFactor=256`, linear interp; functions.h:44-72), regenerating `TanhTable[i]=tanh(i/256)` / `LogisticTable[i]=logistic(i/256)` (generate_lut.py's exact formula — f64 compute → f32 store) in a `LazyLock`, plus `relu`/`clip_f`/`clip_g`/`identity`/`softmax_in_place` (functions.h:85-207). All in f32 to match the FAST_FLOAT build.
+
+Byte-parity GREEN vs a libtesseract oracle on a 4096-point x-sweep (x ∈ [-16,16), tanh + logistic), f32 bit-patterns identical — which ALSO proves the regenerated LUTs match libtesseract's BAKED `TanhTable`/`LogisticTable` byte-exactly (this env's libm == the build's for these values), and that the negative-reflection + f32 interp match. +4 unit tests; clippy `-D warnings` + fmt clean.
+
+With Leaf 2 (`WeightMatrix`) + Leaf 3 (activations), the recognizer now holds both pieces of a FullyConnected layer forward — **Leaf 4 = `FullyConnected::Forward`** composing them (`activation(WeightMatrix·input)`), the first COMPLETE network layer; then LSTM/Series/Parallel → `recodebeam` → the code lattice `recoded_to_text` eats. Cross-ref: `E-OCR-WEIGHTMATRIX-1` (Leaf 2), `E-OCR-MATDOTVEC-1` (Leaf 1). Plan: recognizer-core-shape-v1.md (Leaf 3 EXECUTED). Branch `claude/happy-hamilton-0azlw4`.
 ## 2026-07-04 — E-OCR-WEIGHTMATRIX-1 — recognizer Leaf 2: `WeightMatrix::DeSerialize` (int mode) is byte-parity green vs libtesseract; the forward chains Leaf 1's proven int8 GEMM
 **Status:** FINDING (byte-parity proven vs libtesseract 5.3.4; `tesseract-recognizer`, tested)
 

From a7dba3a8d4f5c91d9b06e6ceee6d6cf2ea2ce581 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Sat, 4 Jul 2026 14:36:57 +0000
Subject: [PATCH 6/7] =?UTF-8?q?contract::network=20=E2=80=94=20sink=20the?=
 =?UTF-8?q?=20Tesseract=20Network=20layer=20graph=20onto=20V3=20SoA=20(byt?=
 =?UTF-8?q?e-parity)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Executes the operator directive "6x8:8, 16 B tenant = classid + 12 B,
ruff>OGAR transpiler sink-in". The polymorphic Network subclass tree lands on
the OGAR Core the right way — NOT a hand-rolled enum (that draft was the
parallel-object-model anti-pattern).

- NEW src/network.rs: NetworkType (27 layer types; ordinal == on-wire kTypeNames
  discriminant, network.h:41-78 / network.cpp:60-75) + NetworkHeader::from_le_bytes
  (the base header Network::CreateFromFile reads before subclass dispatch,
  network.cpp:214-248) + to_facet() (sinks each node onto facet::FacetCascade,
  16 B = classid + 6x8:8, CascadeShape::G6D2) + NetworkType::classid() (the
  invoke_network dispatch seed). facet_classid = compose_classid(network_layer,
  ntype) canon-high; subclass in the classid custom-low half, not 27 slots.
- ogar_codebook: ONE mint network_layer=0x0804 in the 0x08 OCR domain.
- NEW examples/network_dump.rs: the byte-parity surface.

Byte-parity GREEN on real eng.lstm: Rust NetworkHeader::from_le_bytes ==
libtesseract Network::CreateFromFile for the outer node
(Series ni=36 no=111 num_weights=385807 name=Series); the oracle's spec() ==
the model spec string (known-answer self-check, 5.5.0-hdr/5.3.4-lib ABI skew
guarded, oracle built -DFAST_FLOAT). The facet 0x08040009 decodes losslessly.

Reviewed by core-first-architect (TARGETS-CORE), v3-envelope-auditor
(LAYOUT-CLEAN, no version bump), brutally-honest-tester (LAND). Folded in:
compile-lock test (NETWORK_LAYER == codebook mint), custom-half invariant doc,
to_facet debug_assert on the ni/no u16 range. +7 contract tests; clippy
-D warnings + fmt clean (scoped -p lance-graph-contract).

Board: EPIPHANIES E-OCR-NETWORK-SINK-1, LATEST_STATE contract inventory.

Co-Authored-By: Claude <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
---
 .claude/board/EPIPHANIES.md                   |  13 +
 .claude/board/LATEST_STATE.md                 |   6 +-
 .../examples/network_dump.rs                  |  56 ++
 crates/lance-graph-contract/src/lib.rs        |   2 +
 crates/lance-graph-contract/src/network.rs    | 620 ++++++++++++++++++
 .../lance-graph-contract/src/ogar_codebook.rs |   6 +
 6 files changed, 702 insertions(+), 1 deletion(-)
 create mode 100644 crates/lance-graph-contract/examples/network_dump.rs
 create mode 100644 crates/lance-graph-contract/src/network.rs

diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md
index 8c35bfbb..80d7447b 100644
--- a/.claude/board/EPIPHANIES.md
+++ b/.claude/board/EPIPHANIES.md
@@ -177,6 +177,19 @@ New knowledge doc `.claude/knowledge/data-shape-etymology.md` — the shape-and-
 **Status:** FINDING (operator ruling on the shape — "yes valueschema") + embedded CONJECTURE (the preset-vs-dispatch probe)
 
 Operator floated keeping the fast/cheap V2 substrate for huge data alongside V3, "switched by classid," so V3 can eventually teach V2 how to be better. Resolved: the switch is NOT a new carrier. `ClassView::value_schema(classid) -> ValueSchema` (`canonical_node.rs:894`, `class_view.rs:395`) is ALREADY classid→substrate-shape resolution by trait dispatch — resolved, never stored on-wire (adding a variant costs NO `ENVELOPE_LAYOUT_VERSION` bump), and the four existing variants ALREADY form a substrate ladder: `Bootstrap`(empty, key+edges only) / `Compressed`(cold codec, **no hot lifecycle columns**) / `Cognitive`(hot thinking: Meta+Qualia+Fingerprint+Energy+Plasticity+EntityType) / `Full`(every tenant). So "V2 fast/cheap bulk" = classids that resolve to the LEAN end (Bootstrap/Compressed — no ownership/lifecycle tenants); "V3 witnessed/owned" = Cognitive/Full. **A `ClassRoutingDTO` is rejected:** a DTO is a serialized carried payload, but substrate choice is a RESOLUTION (firewall ADR-022, "contracts compile types, the event never leaves"); and per the three-tier canon nothing crosses mailbox boundaries — every reader re-resolves the substrate from the classid already in the 16-byte key, so there is no boundary for a carrier to travel. `dto-soa-savant` + AGI-as-glove name the new-struct-instead-of-resolution shape exactly. **0x1000 is NOT the switch:** canon fixes it as a temporary adoption MONITOR ("monitor, never a semantic"; retires at P4/100%; MODULE-TABLE flags that a future canon==0x1000 aliases the marker) — substrate routes on the classid's concept-half → ValueSchema, never on the monitor bit. **The deep form (CONJECTURE — PROBE preset-vs-dispatch):** the WRITE PATH may be a pure FUNCTION of the schema — a class whose ValueSchema carries no ownership/lifecycle tenants has nothing for the kanban/WAL to witness, so it naturally collapses to the fast private-merge write; Cognitive/Full carry the tenants that REQUIRE the owned/witnessed path. If that holds, substrate = ValueSchema full stop (no separate `Substrate` enum, no flag). The gate: confirm the write path is derivable from which tenants are live vs needing an independent resolution — evidence base is the onebrc arc itself (lane F private-merge/no-tenants vs lanes G–J owned/witnessed = the two write paths already measured). Open sub-question: whether bulk needs a variant leaner than `Compressed`, or Bootstrap/Compressed already suffice. **"V3 teaches V2" (deferred, needs mechanism):** V3's kanban WAL + ownership journal is the profiling signal (where contention lands, which fields are touched) to optimize the lean V2 layout — the instrumented-teacher / stripped-student loop; no code reads the WAL back into a layout optimizer yet. Net: at most a new `ValueSchema` variant through the existing `value_schema(classid)` door; possibly not even that.
+## 2026-07-04 — E-OCR-NETWORK-SINK-1 — the Tesseract `Network` layer graph sinks onto V3 SoA via ruff→OGAR: base-header parse byte-parity green + `FacetCascade` (16 B) sink, NOT a hand-rolled enum
+**Status:** FINDING (byte-parity proven vs libtesseract 5.3.4; `lance-graph-contract`, tested)
+
+The operator directive — *"use new V3 substrate AR rail shaped (6x8:8), 16 bytes tenant = classid + 12 bytes, use ruff>OGAR transpiler sink-in substrate"* — is executed and proven. The polymorphic `Network` subclass tree is sunk onto the Core the RIGHT way (a hand-rolled `enum NetworkKind` was rejected earlier this arc as the parallel-object-model anti-pattern):
+
+1. **ruff→OGAR harvest** (`ruff/crates/ruff_cpp_spo/examples/harvest_network.rs`, committed) — the libclang walker over the 11 network layer headers emits the `has_function`/`inherits_from`/`virtually_overrides` SPO manifest: **62 classes, 5060 triples** on real Tesseract 5.5.0 src. The `Forward` override set (FullyConnected/LSTM/Series/Parallel/Convolve/Maxpool/Reversed/Reconfig/Input) = the compute-leaf list; the `DeSerialize` set (FullyConnected/LSTM/Plumbing/Convolve/Maxpool/Reconfig/Input) = the binary-leaf list. This IS the `classid → ClassView` method-resolution manifest (the vtable the enum would have faked).
+2. **Base-header leaf** (`lance_graph_contract::network`) — `NetworkHeader::from_le_bytes` transcodes the shared serialization prefix EVERY layer writes (`network.cpp:214-248` `Network::CreateFromFile`: `i8 tag(0) | u32+str type_name | i8 training | i8 needs_backprop | i32 flags | i32 ni | i32 no | i32 num_weights | u32+str name`). `NetworkType` (27 types, ordinal == discriminant, `kTypeNames` on-wire strings) + `to_facet()`.
+3. **V3 SoA sink** — each node → `crate::facet::FacetCascade` (16 B = `classid(4) | 6×(8:8)`), read under `CascadeShape::G6D2` (the "6x8:8"): tier0=ni, tier1=no, tier2=flags, tiers3-4=num_weights u32, tier5=lifecycle. `facet_classid = compose_classid(NETWORK_LAYER=0x0804, ntype)` — canon-high, ONE `network_layer` OCR-domain mint (the 27 subclasses live in the classid custom-low half, NOT 27 codebook slots). Name + weight blob are out-of-line (`I-VSA-IDENTITIES`).
+
+**Byte-parity GREEN** on real `/tmp/eng.lstm`: Rust parse == libtesseract `Network::CreateFromFile` for the outer node — `Series ni=36 no=111 num_weights=385807 name=Series` — with the oracle's `spec()` == the model spec `[1,36,0,1[C3,3Ft16]Mp3,3TxyLfys48Lfx96RxLrx96Lfx192Fc111]` (the known-answer self-check guarding the 5.5.0-hdr/5.3.4-lib ABI skew; oracle built `-DFAST_FLOAT`). The facet `0x08040009` decodes losslessly (ni=36/no=111/flags=192/nw=385807/lifecycle=0). Example `network_dump.rs`; +5 contract tests; clippy `-D warnings` + fmt clean (`-p lance-graph-contract` scoped).
+
+Deferred (follow-ups): per-subclass payload parse + tree recursion (Plumbing children → `EdgeBlock`, weights → out-of-line Lance column); the `invoke_network` keystone (dispatch already proven generically by E-CPP-KEYSTONE-1); the recognizer COMPUTE leaves (`tesseract-recognizer`, deps ndarray — Leaf 4 `FullyConnected::Forward`, Leaf 5 `LSTM::Forward`, then `recodebeam`). Plan: `tesseract-rs/.claude/plans/network-ruff-ogar-sink-v1.md`. Cross-ref: `E-CPP-PARITY-7` (recoder, the sibling binary leaf), `E-OCR-MATDOTVEC-1`/`E-OCR-WEIGHTMATRIX-1`/`E-OCR-ACTIVATION-1` (the compute leaves), `E-CPP-KEYSTONE-1` (classid→ClassView dispatch). Branch `claude/happy-hamilton-0azlw4`.
+
 ## 2026-07-04 — E-OCR-ACTIVATION-1 — recognizer Leaf 3: the LUT activations (Tanh/Logistic + Relu/Clip/Softmax) are byte-parity green vs libtesseract; the regenerated tables match the baked ones
 **Status:** FINDING (byte-parity proven vs libtesseract 5.3.4; `tesseract-recognizer`, tested)
 
diff --git a/.claude/board/LATEST_STATE.md b/.claude/board/LATEST_STATE.md
index 353f1672..68d79495 100644
--- a/.claude/board/LATEST_STATE.md
+++ b/.claude/board/LATEST_STATE.md
@@ -10,6 +10,10 @@
 
 ---
 
+## 2026-07-04 — branch `claude/happy-hamilton-0azlw4` — `contract::network` — the Tesseract `Network` layer graph sunk onto V3 SoA via ruff→OGAR (byte-parity vs libtesseract)
+
+**NEW** `lance_graph_contract::network`: `NetworkType` (27 layer types, ordinal == on-wire `kTypeNames` discriminant) + `NetworkHeader` (`from_le_bytes` = the base header `Network::CreateFromFile` reads before subclass dispatch: `i8 tag | u32+str type_name | i8 training | i8 needs_backprop | i32 flags | i32 ni | i32 no | i32 num_weights | u32+str name`) + `to_facet()` (the V3 SoA sink) + `NetworkType::classid()` (the `invoke_network` dispatch seed). Executes the operator directive *"6x8:8, 16 B tenant = classid + 12 B, ruff>OGAR sink-in"*: (1) the `ruff_cpp_spo` `harvest_network` example (committed to ruff) walks the 11 network headers via libclang → the `has_function`/`virtually_overrides` SPO manifest (62 classes, 5060 triples) = the `classid → ClassView` method-resolution table, NOT a hand-rolled enum; (2) each node sinks onto `crate::facet::FacetCascade` (16 B = `classid(4) | 6×(8:8)`, read `CascadeShape::G6D2`): tier0=ni, tier1=no, tier2=flags, tiers3-4=num_weights u32, tier5=lifecycle; `facet_classid = compose_classid(network_layer=0x0804, ntype)` canon-high. Byte-parity **GREEN** on real `/tmp/eng.lstm`: Rust parse == libtesseract `Network::CreateFromFile` — `Series ni=36 no=111 num_weights=385807 name=Series` — oracle `spec()` == the model spec string (known-answer self-check, 5.5.0-hdr/5.3.4-lib ABI skew guarded). Example `network_dump.rs`; +5 contract tests; clippy `-D warnings` + fmt clean (scoped `-p lance-graph-contract`). ONE `network_layer`=0x0804 OCR-domain mint added (subclasses in classid custom-low, not 27 slots). Deferred: per-subclass payload + tree recursion, the `invoke_network` keystone, the recognizer COMPUTE leaves. Refs: EPIPHANIES `E-OCR-NETWORK-SINK-1`; plan `tesseract-rs/.claude/plans/network-ruff-ogar-sink-v1.md`. Not yet a PR.
+
 ## 2026-07-04 — branch `claude/happy-hamilton-0azlw4` — `contract::unicharcompress` — the Tesseract recoder load side (byte-parity vs libtesseract)
 
 **NEW** `lance_graph_contract::unicharcompress`: `UnicharCompress` (the LSTM recoder's code↔id table) + `RecodedCharId` + `RecoderError`, load side only (`from_le_bytes` / `load_from_file` = C++ `DeSerialize`; `encode` / `decode` / `code_range`; `dump_encode` / `dump_decode` parity surfaces). The FIRST binary-format leaf (`TFile` little-endian: `u32 count` + per-entry `[i8 self_normalized][i32 length][i32×length code]`). Byte-parity **GREEN** on real `/tmp/eng.lstm-recoder` — encode 112/112 + decode 112/112 + code_range=111 — via the committed `examples/recoder_dump.rs`, diffed vs a libtesseract 5.3.4 oracle (the 5.5.0-header ABI skew self-validated by the `Encode∘Decode` round-trip + `enc_size=112`). +10 contract tests; `-p lance-graph-contract` clippy `-D warnings` + fmt clean. Consumed by `tesseract-core::{Recoder, recoded_to_text}` (codes→decode→ids→`ids_to_text`; +1 boundary test, 8/8). Resolves the `recoder`=0x0802 concept (OGAR #148 mint, mirrored in the "0x08XX OCR rows" line below) to its content-store module. The recoder keystone (`invoke_recoder`) is UNBLOCKED but deferred (dispatch already proven generically by E-CPP-KEYSTONE-1). Refs: EPIPHANIES `E-CPP-PARITY-7`. Not yet a PR.
@@ -699,5 +703,5 @@ PR sequence: #360 → #361 → post-#360 substrate-sweep (this PR).
 
 - **`codegen_spine::RouteBucketTyped`** (NEW; C6 merged verbatim from op-nexgen's vendored diff, codex-reviewed on nexgen PR #8). Kind-generic sibling of `RouteBucket` (`type Kind: Copy + Eq`) + `?Sized` blanket bridge (`impl<T: RouteBucket + ?Sized> RouteBucketTyped for T { type Kind = OdooMethodKind; }`) so non-Odoo codegen targets bring their own kind enum additively. Coherence rule: a type needing a different Kind skips the legacy trait. 12/12 module tests incl. dyn-object coverage.
 - **`emission_scan`** (NEW; op-nexgen L2). Zero-dep typed-DDL adoption counter, `classid_scan`'s design-language sibling: `TypedForm {Typed, AnyTyped, RecordLink, Stub}` (#[non_exhaustive]) + tokenizer `classify_ddl_type` (precedence Stub > RecordLink > AnyTyped > Typed; word-boundary tokens so `many`/`recording` never false-match) + `EmissionCounts` fold with `typed_ratio()` (f64, mirrors `adoption_pct`). 15 tests. Module doc NAMES the contract scan-family pattern (Form enum + classify_* + fold-to-counts): the next governance counter mirrors it.
-- **`ogar_codebook` 0x08XX OCR rows** — `unicharset` (0x0801) / `recoder` (0x0802) / `charset` (0x0803) mirroring OGAR #148's mint (container kinds only; content never becomes concepts — Osint zero-rows precedent). Drift-guard test extended. CODEBOOK now 68 entries.
+- **`ogar_codebook` 0x08XX OCR rows** — `unicharset` (0x0801) / `recoder` (0x0802) / `charset` (0x0803) / `network_layer` (0x0804) mirroring OGAR #148's mint (container kinds only; content never becomes concepts — Osint zero-rows precedent). `network_layer` = the KIND "a Tesseract recognizer network layer"; the 27 subclasses live in the classid custom-low half (`NetworkType` ordinal), NOT 27 slots. Drift-guard test extended. CODEBOOK now 69 entries.
 - **Rulings + intake record:** EPIPHANIES E-V3-XSESSION-INTAKE-1(+RULINGS), E-V3-GRAPHRAG-INV-1; handover `.claude/handovers/2026-07-02-cross-session-wishlist-intake.md`; plan Addendum-10/11 (per-consumer classid ownership + tripwires ratified; R-1 naming phantom closed — `domain:appid:classview`; R-2 closed — 512-byte row frozen, edges via strided view; L3 new-Arrow-schema design killed; five post-fuse workstreams enumerated). Knowledge: `graphrag-rs-inventory.md`.
diff --git a/crates/lance-graph-contract/examples/network_dump.rs b/crates/lance-graph-contract/examples/network_dump.rs
new file mode 100644
index 00000000..c15b192f
--- /dev/null
+++ b/crates/lance-graph-contract/examples/network_dump.rs
@@ -0,0 +1,56 @@
+//! Dump the base `Network` header at the front of a serialized recognizer
+//! component (`eng.lstm`) — the Rust side of the network base-header byte-parity
+//! leaf, sibling to `recoder_dump`. Also prints the [`FacetCascade`] the node
+//! sinks onto (the ruff→OGAR harvest → V3 SoA target).
+//!
+//! ```sh
+//! # Extract the lstm component (starts with the network, lstmrecognizer.cpp:135):
+//! combine_tessdata -u $(dpkg -L tesseract-ocr-eng | grep eng.traineddata) /tmp/eng.
+//! # C++ oracle (network_spec_oracle.cpp): links libtesseract, calls the REAL
+//! # Network::CreateFromFile on the same bytes and dumps the loaded top node's
+//! # type / ni / no / num_weights / name + spec() (the known-answer self-check).
+//! #   ./network_spec_oracle /tmp/eng.lstm > /tmp/oracle_network.txt
+//! # Rust side (parses only the base header — the shared prefix of every layer):
+//! cargo run -p lance-graph-contract --example network_dump -- /tmp/eng.lstm > /tmp/rust_network.txt
+//! # The "header:" line is byte-identical between the two => the base header
+//! # parse is byte-parity green.
+//! ```
+
+#![allow(
+    clippy::print_stdout,
+    reason = "a dump CLI example writes to stdout by design"
+)]
+
+use std::process::ExitCode;
+
+use lance_graph_contract::network::NetworkHeader;
+
+fn main() -> ExitCode {
+    let Some(path) = std::env::args().nth(1) else {
+        eprintln!("usage: network_dump <path/to/eng.lstm>");
+        return ExitCode::FAILURE;
+    };
+    let bytes = match std::fs::read(&path) {
+        Ok(b) => b,
+        Err(err) => {
+            eprintln!("error reading {path}: {err}");
+            return ExitCode::FAILURE;
+        }
+    };
+    match NetworkHeader::from_le_bytes(&bytes) {
+        Ok((header, consumed)) => {
+            // The byte-parity line (diffed against the oracle's loaded top node).
+            println!("header: {}", header.dump());
+            // The V3 SoA sink: the 16-byte FacetCascade (classid + 6×8:8), hex.
+            let f = header.to_facet();
+            let hex: String = f.to_bytes().iter().map(|b| format!("{b:02x}")).collect();
+            println!("facet:  classid={:#010x} bytes={hex}", f.facet_classid);
+            println!("consumed: {consumed} bytes (base header; subclass payload follows)");
+            ExitCode::SUCCESS
+        }
+        Err(err) => {
+            eprintln!("error parsing header: {err:?}");
+            ExitCode::FAILURE
+        }
+    }
+}
diff --git a/crates/lance-graph-contract/src/lib.rs b/crates/lance-graph-contract/src/lib.rs
index f206e7aa..a25071a5 100644
--- a/crates/lance-graph-contract/src/lib.rs
+++ b/crates/lance-graph-contract/src/lib.rs
@@ -96,6 +96,8 @@ pub mod manifest;
 pub mod mul;
 pub mod nan_projection;
 pub mod nars;
+/// LSTM `Network` layer-graph structure — base-header parse + `FacetCascade` sink.
+pub mod network;
 pub mod ocr;
 /// D-OVC-1 — OGAR concept codebook (`0xDDCC` domain layout), wire-compat mirror.
 pub mod ogar_codebook;
diff --git a/crates/lance-graph-contract/src/network.rs b/crates/lance-graph-contract/src/network.rs
new file mode 100644
index 00000000..df41479d
--- /dev/null
+++ b/crates/lance-graph-contract/src/network.rs
@@ -0,0 +1,620 @@
+//! LSTM `Network` layer-graph structure — the Rust side of the network
+//! base-header byte-parity leaf, and the **sink of the ruff→OGAR harvest onto
+//! the V3 SoA** ([`crate::facet::FacetCascade`]).
+//!
+//! Tesseract's recognizer is a tree of `Network` subclasses (`lstm/network.{h,cpp}`
+//! + `series.cpp` / `parallel.cpp` / `fullyconnected.cpp` / `lstm.cpp` / …). Every
+//! node — whatever its subclass — is serialized with the SAME base header, written
+//! by `Network::Serialize` and read back by the factory `Network::CreateFromFile`
+//! (`network.cpp:155-248`). This module transcodes that **base header** (the shared
+//! prefix of every layer) + the `kTypeNames` on-wire type discriminant, and sinks
+//! each parsed node onto a content-blind [`FacetCascade`] — the operator's
+//! "16-byte tenant, classid + 12 bytes" V3 substrate.
+//!
+//! # Core-First placement
+//!
+//! Per the Core-First doctrine this is **structure** (identity + typed dims), not
+//! compute: the recognizer's `Forward`/weight math lives in `tesseract-recognizer`
+//! (deps ndarray); the layer *graph* — which types, nested how, with what
+//! `ni`/`no` — is content the OGAR Core owns, exactly like the recoder
+//! ([`crate::unicharcompress`]) and the unicharset ([`crate::unicharset`]). The
+//! `ruff_cpp_spo` harvest (`has_function` / `virtually_overrides`) is the
+//! `classid → ClassView` method-resolution manifest; THIS is where a harvested
+//! node lands as a typed SoA row. No parallel object model: a network node is a
+//! [`FacetCascade`], its type a `classid`, never a bespoke `enum NetworkKind`.
+//!
+//! # Base-header wire format (byte-parity surface)
+//!
+//! The factory reads, in order (`network.cpp:214-248`; little-endian,
+//! `TFile::swap_ == false` on x86; `std::string` = `u32 len` + `len` raw bytes,
+//! `serialis.cpp:94-110`):
+//!
+//! ```text
+//! i8   tag                 // always NT_NONE(0); getNetworkType, network.cpp:191
+//! u32  type_name_len       // then the ASCII type name (kTypeNames entry)
+//! …    type_name bytes     // "Series" / "Input" / "LSTM" / … — the discriminant
+//! i8   training            // TrainingState (recognizer = TS_DISABLED)
+//! i8   needs_to_backprop   // 0/1
+//! i32  network_flags       // NetworkFlags bits
+//! i32  ni                  // number of inputs
+//! i32  no                  // number of outputs
+//! i32  num_weights         // weights in THIS node and its sub-network (cumulative)
+//! u32  name_len            // then the layer's unique name
+//! …    name bytes
+//! ```
+//!
+//! then the subclass's own `DeSerialize` payload (weights / children) — DEFERRED
+//! to follow-up leaves (the per-subclass payloads: `Plumbing` reads its child
+//! vector, `FullyConnected`/`LSTM` read `WeightMatrix` blobs). This leaf proves the
+//! shared base header, exactly as the recoder leaf proved the header before the
+//! beam maps.
+//!
+//! For real `eng.lstm` (the extracted `lstm` component; `LSTMRecognizer::DeSerialize`
+//! calls `Network::CreateFromFile` FIRST, `lstmrecognizer.cpp:135`) the outermost
+//! node parses to `type=Series, ni=36, no=111, num_weights=385807` — matching the
+//! model spec `[1,36,0,1[C3,3Ft16]Mp3,3TxyLfys48Lfx96RxLrx96Lfx192Fc111]` (ni=36
+//! feature rows, no=111 = the Fc111 softmax classes). That is the first-principles
+//! pre-registration of a correct parse (the recoder-leaf method).
+//!
+//! [`NetworkHeader::dump`] is the byte-parity surface, diffed against the C++
+//! `network_spec_oracle` (which links libtesseract, calls the real
+//! `Network::CreateFromFile`, and dumps `spec()` / `ni()` / `no()` /
+//! `num_weights()` / `name()` of the loaded top node).
+
+use crate::facet::{FacetCascade, FacetTier};
+use crate::ogar_codebook::compose_classid;
+
+/// The `network_layer` container concept in the `0x08XX` OCR domain
+/// ([`crate::ogar_codebook`]). One canon-high slot for the KIND "a Tesseract
+/// network layer"; the SPECIFIC subclass (Series / LSTM / …) is the classid's
+/// custom-low half = the [`NetworkType`] ordinal, NOT 27 codebook slots (the
+/// "container kinds, not content" mint discipline). `compose_classid(NETWORK_LAYER,
+/// nt as u16)` is the node's `facet_classid`.
+///
+/// **Custom-half invariant:** a network-layer classid's custom-low half is the
+/// [`NetworkType`] ordinal — a recognizer-INTERNAL facet discriminant, never a
+/// render/RBAC app-prefix ([`classid_app_prefix`](crate::ogar_codebook::classid_app_prefix)).
+/// These facet classids stay inside the OCR recognizer's SoA; they are never fed
+/// to the app-prefix render path (which would misread ordinal 14 as an `AppPrefix`).
+/// The value is kept in lock-step with the codebook by
+/// [`tests::network_layer_const_matches_codebook`].
+pub const NETWORK_LAYER: u16 = 0x0804;
+
+/// `NetworkType` — the serialized layer-type discriminant (`network.h:41-78`,
+/// `enum NetworkType`). The ordinal IS the discriminant and is stable across
+/// versions (the `kTypeNames` string, written on the wire, decouples the on-disk
+/// form from the enum order — `network.cpp:56-75`). `NT_NONE`(0) is the naked base
+/// class / "invalid" sentinel; `NT_COUNT` is the array size, not a real type.
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
+#[repr(u8)]
+pub enum NetworkType {
+    /// The naked base class ("Invalid" on the wire) — the 0 sentinel.
+    None = 0,
+    /// Inputs from an image.
+    Input = 1,
+    /// Duplicates inputs in a sliding-window neighborhood.
+    Convolve = 2,
+    /// Chooses the max result from a rectangle.
+    Maxpool = 3,
+    /// Runs networks in parallel.
+    Parallel = 4,
+    /// Runs identical networks in parallel.
+    Replicated = 5,
+    /// Runs LTR and RTL LSTMs in parallel.
+    ParRlLstm = 6,
+    /// Runs Up and Down LSTMs in parallel.
+    ParUdLstm = 7,
+    /// Runs 4 LSTMs in parallel.
+    Par2dLstm = 8,
+    /// Executes a sequence of layers.
+    Series = 9,
+    /// Scales the time/y size but makes the output deeper.
+    Reconfig = 10,
+    /// Reverses the x direction of the inputs/outputs.
+    XReversed = 11,
+    /// Reverses the y-direction of the inputs/outputs.
+    YReversed = 12,
+    /// Transposes x and y (for a single op).
+    XyTranspose = 13,
+    /// Long-Short-Term-Memory block.
+    Lstm = 14,
+    /// LSTM that only keeps its last output.
+    LstmSummary = 15,
+    /// Fully connected logistic nonlinearity.
+    Logistic = 16,
+    /// Fully connected rect-lin version of logistic.
+    PosClip = 17,
+    /// Fully connected rect-lin version of tanh.
+    SymClip = 18,
+    /// Fully connected with tanh nonlinearity.
+    Tanh = 19,
+    /// Fully connected with rectifier nonlinearity.
+    Relu = 20,
+    /// Fully connected with no nonlinearity.
+    Linear = 21,
+    /// Softmax with exponential normalization, with CTC.
+    Softmax = 22,
+    /// Softmax with exponential normalization, no CTC.
+    SoftmaxNoCtc = 23,
+    /// 1-d LSTM with built-in fully connected softmax.
+    LstmSoftmax = 24,
+    /// 1-d LSTM with built-in binary-encoded softmax.
+    LstmSoftmaxEncoded = 25,
+    /// A TensorFlow graph encapsulated as a Tesseract network.
+    TensorFlow = 26,
+}
+
+/// The number of real `NetworkType`s (`NT_COUNT`, `network.h:78`) — the length of
+/// the [`NetworkType::TYPE_NAMES`] table.
+pub const NT_COUNT: usize = 27;
+
+impl NetworkType {
+    /// The on-wire `kTypeNames` strings (`network.cpp:60-75`), indexed by ordinal.
+    /// This is the serialization discriminant matched by `getNetworkType`
+    /// (`network.cpp:191-209`) — index-aligned with the enum, so
+    /// `TYPE_NAMES[nt as usize] == nt.type_name()`.
+    pub const TYPE_NAMES: [&'static str; NT_COUNT] = [
+        "Invalid",
+        "Input",
+        "Convolve",
+        "Maxpool",
+        "Parallel",
+        "Replicated",
+        "ParBidiLSTM",
+        "DepParUDLSTM",
+        "Par2dLSTM",
+        "Series",
+        "Reconfig",
+        "RTLReversed",
+        "TTBReversed",
+        "XYTranspose",
+        "LSTM",
+        "SummLSTM",
+        "Logistic",
+        "LinLogistic",
+        "LinTanh",
+        "Tanh",
+        "Relu",
+        "Linear",
+        "Softmax",
+        "SoftmaxNoCTC",
+        "LSTMSoftmax",
+        "LSTMBinarySoftmax",
+        "TensorFlow",
+    ];
+
+    /// This type's `kTypeNames` string (the inverse of [`from_type_name`]).
+    ///
+    /// [`from_type_name`]: NetworkType::from_type_name
+    #[inline]
+    #[must_use]
+    pub const fn type_name(self) -> &'static str {
+        Self::TYPE_NAMES[self as usize]
+    }
+
+    /// Resolve an ordinal (`0..NT_COUNT`) to a [`NetworkType`] — the enum
+    /// discriminant. `None` for `NT_COUNT` or beyond.
+    #[inline]
+    #[must_use]
+    pub const fn from_ordinal(o: u8) -> Option<NetworkType> {
+        // Exhaustive match: the compiler proves every real ordinal is covered.
+        Some(match o {
+            0 => NetworkType::None,
+            1 => NetworkType::Input,
+            2 => NetworkType::Convolve,
+            3 => NetworkType::Maxpool,
+            4 => NetworkType::Parallel,
+            5 => NetworkType::Replicated,
+            6 => NetworkType::ParRlLstm,
+            7 => NetworkType::ParUdLstm,
+            8 => NetworkType::Par2dLstm,
+            9 => NetworkType::Series,
+            10 => NetworkType::Reconfig,
+            11 => NetworkType::XReversed,
+            12 => NetworkType::YReversed,
+            13 => NetworkType::XyTranspose,
+            14 => NetworkType::Lstm,
+            15 => NetworkType::LstmSummary,
+            16 => NetworkType::Logistic,
+            17 => NetworkType::PosClip,
+            18 => NetworkType::SymClip,
+            19 => NetworkType::Tanh,
+            20 => NetworkType::Relu,
+            21 => NetworkType::Linear,
+            22 => NetworkType::Softmax,
+            23 => NetworkType::SoftmaxNoCtc,
+            24 => NetworkType::LstmSoftmax,
+            25 => NetworkType::LstmSoftmaxEncoded,
+            26 => NetworkType::TensorFlow,
+            _ => return None,
+        })
+    }
+
+    /// Resolve an on-wire type name to a [`NetworkType`] — the exact
+    /// `getNetworkType` match loop (`network.cpp:201`): linear scan of
+    /// [`TYPE_NAMES`]. `None` is `getNetworkType`'s `data == NT_COUNT` "Invalid
+    /// network layer type" path.
+    ///
+    /// [`TYPE_NAMES`]: NetworkType::TYPE_NAMES
+    #[inline]
+    #[must_use]
+    pub fn from_type_name(name: &str) -> Option<NetworkType> {
+        let mut i = 0;
+        while i < NT_COUNT {
+            if Self::TYPE_NAMES[i] == name {
+                return Self::from_ordinal(i as u8);
+            }
+            i += 1;
+        }
+        None
+    }
+
+    /// This layer type's full `classid` in the OCR domain: canon =
+    /// [`NETWORK_LAYER`], custom = the type ordinal. The node's `facet_classid`;
+    /// the `invoke_network` dispatch (the `invoke_unicharset` keystone analog)
+    /// resolves the subclass by [`classid_custom`](crate::ogar_codebook::classid_custom).
+    #[inline]
+    #[must_use]
+    pub fn classid(self) -> u32 {
+        compose_classid(NETWORK_LAYER, self as u16)
+    }
+}
+
+/// A parse error in a serialized [`NetworkHeader`].
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub enum NetworkError {
+    /// The buffer ended before the base header was fully read.
+    UnexpectedEof,
+    /// The `tag` byte was not `NT_NONE`(0) — an unversioned/foreign blob
+    /// (`getNetworkType` only branches into the string path when `tag == 0`).
+    BadTag(i8),
+    /// The `type_name` string did not match any [`NetworkType::TYPE_NAMES`]
+    /// entry (`getNetworkType`'s `data == NT_COUNT` path).
+    UnknownType,
+    /// A negative dimension (`ni`/`no`/`num_weights` are non-negative for any
+    /// serialized model).
+    NegativeDim,
+}
+
+/// The base `Network` header shared by every layer node — the fields
+/// `Network::CreateFromFile` reads before dispatching to the subclass
+/// (`network.cpp:214-248`).
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub struct NetworkHeader {
+    /// The layer subclass, from the `kTypeNames` on-wire discriminant.
+    pub ntype: NetworkType,
+    /// `TrainingState` byte (recognizer models serialize `TS_DISABLED`).
+    pub training: i8,
+    /// Whether the node needs to output back-deltas (`0`/`1`).
+    pub needs_backprop: bool,
+    /// `NetworkFlags` bits.
+    pub network_flags: i32,
+    /// Number of input values.
+    pub ni: i32,
+    /// Number of output values.
+    pub no: i32,
+    /// Number of weights in THIS node and its sub-network (cumulative).
+    pub num_weights: i32,
+    /// The layer's unique name.
+    pub name: String,
+}
+
+impl NetworkHeader {
+    /// Parse the base header from the front of `bytes`, returning the header and
+    /// the number of bytes consumed (the offset at which the subclass payload
+    /// begins). Rejects a non-zero tag, an unknown type name, and negative dims
+    /// — a serialized model never carries them, so they signal a bad/foreign
+    /// blob rather than silently mis-parsing (stricter than the C++ factory,
+    /// which trusts its own output).
+    pub fn from_le_bytes(bytes: &[u8]) -> Result<(NetworkHeader, usize), NetworkError> {
+        let mut r = ByteReader::new(bytes);
+        let tag = r.read_i8()?;
+        if tag != 0 {
+            return Err(NetworkError::BadTag(tag));
+        }
+        let type_name = r.read_string()?;
+        let ntype = NetworkType::from_type_name(&type_name).ok_or(NetworkError::UnknownType)?;
+        let training = r.read_i8()?;
+        let needs_backprop = r.read_i8()? != 0;
+        let network_flags = r.read_i32()?;
+        let ni = r.read_i32()?;
+        let no = r.read_i32()?;
+        let num_weights = r.read_i32()?;
+        if ni < 0 || no < 0 || num_weights < 0 {
+            return Err(NetworkError::NegativeDim);
+        }
+        let name = r.read_string()?;
+        Ok((
+            NetworkHeader {
+                ntype,
+                training,
+                needs_backprop,
+                network_flags,
+                ni,
+                no,
+                num_weights,
+                name,
+            },
+            r.pos,
+        ))
+    }
+
+    /// Sink this node onto the V3 SoA as a content-blind [`FacetCascade`] — the
+    /// "16-byte tenant, classid + 12 bytes" substrate, read under
+    /// [`CascadeShape::G6D2`](crate::facet::CascadeShape::G6D2) (six `u16` tiers).
+    ///
+    /// The `network_layer` ClassView projection of the 6 tiers:
+    ///
+    /// | tier | 8:8 `u16` | field |
+    /// |---|---|---|
+    /// | 0 | `ni` | inputs |
+    /// | 1 | `no` | outputs |
+    /// | 2 | `network_flags & 0xFFFF` | behaviour flags |
+    /// | 3 | `num_weights` low 16 | cumulative weight count (lo) |
+    /// | 4 | `num_weights` high 16 | cumulative weight count (hi) |
+    /// | 5 | `training : needs_backprop` | lifecycle bytes (`lo:hi`) |
+    ///
+    /// `facet_classid` = [`NetworkType::classid`] (`NETWORK_LAYER : ntype`). The
+    /// **name** is NOT bundled (`I-VSA-IDENTITIES`: the facet is the identity +
+    /// typed dims; the name string is content in an out-of-line store keyed by the
+    /// classid+identity). The **weights** are out-of-line too — only their `count`
+    /// rides tiers 3-4; the blob is a separate Lance column. `ni`/`no`/flags are
+    /// truncated to `u16` (every real eng.lstm dim is `< 65536`); a hypothetical
+    /// `> u16` model would carry the overflow out-of-line, same as the weights.
+    #[inline]
+    #[must_use]
+    pub fn to_facet(&self) -> FacetCascade {
+        // ni/no are the semantic dims that MUST round-trip; every real eng.lstm dim
+        // is < 65536, but a hypothetical wider model would truncate here silently.
+        // Fail loudly in debug (mirrors the CANON mint-path `debug_assert`); a real
+        // out-of-range dim is the trigger to add an out-of-line escape. `ni`/`no` are
+        // non-negative (`NegativeDim` is rejected in `from_le_bytes`). `network_flags`
+        // is a bitmask whose low-16 is the documented projection, not a dim, so it is
+        // deliberately not asserted. The prefix-routing redouts (`hi_distance` etc.)
+        // are NOT meaningful across the tiers-3/4 `num_weights` split — this facet is
+        // read as 6× concatenated-`u16`, not as `hi`/`lo` prefix chains.
+        debug_assert!(
+            (self.ni as u32) <= u16::MAX as u32 && (self.no as u32) <= u16::MAX as u32,
+            "network ni/no exceeds u16 — needs an out-of-line escape (network.rs::to_facet)"
+        );
+        let nw = self.num_weights as u32;
+        FacetCascade {
+            facet_classid: self.ntype.classid(),
+            tiers: [
+                tier_u16(self.ni as u32 as u16),
+                tier_u16(self.no as u32 as u16),
+                tier_u16(self.network_flags as u32 as u16),
+                tier_u16((nw & 0xFFFF) as u16),
+                tier_u16((nw >> 16) as u16),
+                FacetTier {
+                    lo: self.training as u8,
+                    hi: u8::from(self.needs_backprop),
+                },
+            ],
+        }
+    }
+
+    /// A one-line byte-parity dump (`type ni no num_weights name`) — the surface
+    /// diffed against the C++ `network_spec_oracle`.
+    #[must_use]
+    pub fn dump(&self) -> String {
+        format!(
+            "{} ni={} no={} num_weights={} name={}",
+            self.ntype.type_name(),
+            self.ni,
+            self.no,
+            self.num_weights,
+            self.name
+        )
+    }
+}
+
+/// One 8:8 [`FacetTier`] carrying a `u16` as `(hi, lo)` — the concatenated-`u16`
+/// projection ([`FacetTier::as_u16`] is its inverse).
+#[inline]
+const fn tier_u16(v: u16) -> FacetTier {
+    FacetTier {
+        lo: (v & 0xFF) as u8,
+        hi: (v >> 8) as u8,
+    }
+}
+
+/// A forward-only little-endian cursor (the Core's per-module binary-read idiom;
+/// mirrors [`crate::unicharcompress`]'s reader). `TFile::swap_ == false` on a LE
+/// host, so scalars are raw `from_le_bytes`; a `std::string` is a `u32` length
+/// prefix then that many raw bytes (`serialis.cpp:94-110`).
+struct ByteReader<'a> {
+    bytes: &'a [u8],
+    pos: usize,
+}
+
+impl<'a> ByteReader<'a> {
+    fn new(bytes: &'a [u8]) -> Self {
+        Self { bytes, pos: 0 }
+    }
+
+    fn take(&mut self, n: usize) -> Result<&'a [u8], NetworkError> {
+        let end = self.pos.checked_add(n).ok_or(NetworkError::UnexpectedEof)?;
+        let slice = self
+            .bytes
+            .get(self.pos..end)
+            .ok_or(NetworkError::UnexpectedEof)?;
+        self.pos = end;
+        Ok(slice)
+    }
+
+    fn read_i8(&mut self) -> Result<i8, NetworkError> {
+        Ok(self.take(1)?[0] as i8)
+    }
+
+    fn read_i32(&mut self) -> Result<i32, NetworkError> {
+        let arr: [u8; 4] = self
+            .take(4)?
+            .try_into()
+            .map_err(|_| NetworkError::UnexpectedEof)?;
+        Ok(i32::from_le_bytes(arr))
+    }
+
+    fn read_u32(&mut self) -> Result<u32, NetworkError> {
+        let arr: [u8; 4] = self
+            .take(4)?
+            .try_into()
+            .map_err(|_| NetworkError::UnexpectedEof)?;
+        Ok(u32::from_le_bytes(arr))
+    }
+
+    /// A `TFile` `std::string`: `u32 len` then `len` raw bytes (`serialis.cpp:94-110`).
+    fn read_string(&mut self) -> Result<String, NetworkError> {
+        let len = self.read_u32()? as usize;
+        let bytes = self.take(len)?;
+        Ok(String::from_utf8_lossy(bytes).into_owned())
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::facet::CascadeShape;
+    use crate::ogar_codebook::{canonical_concept_id, classid_canon, classid_custom};
+
+    #[test]
+    fn network_layer_const_matches_codebook() {
+        // The compile-lock: NETWORK_LAYER (used to build every facet_classid) must
+        // equal the codebook's `network_layer` mint — else a rename/renumber on one
+        // side silently mis-routes every network node's classid (core-first-architect
+        // hygiene finding). The codebook is the single source of truth.
+        assert_eq!(
+            canonical_concept_id("network_layer"),
+            Some(NETWORK_LAYER),
+            "network_layer const drifted from the ogar_codebook mint"
+        );
+    }
+
+    /// Build the base header a `Network::Serialize` would write for a node.
+    fn header_bytes(type_name: &str, ni: i32, no: i32, num_weights: i32, name: &str) -> Vec<u8> {
+        let mut b = Vec::new();
+        b.push(0u8); // tag = NT_NONE
+        b.extend_from_slice(&(type_name.len() as u32).to_le_bytes());
+        b.extend_from_slice(type_name.as_bytes());
+        b.push(0u8); // training = TS_DISABLED
+        b.push(0u8); // needs_backprop = false
+        b.extend_from_slice(&192i32.to_le_bytes()); // network_flags
+        b.extend_from_slice(&ni.to_le_bytes());
+        b.extend_from_slice(&no.to_le_bytes());
+        b.extend_from_slice(&num_weights.to_le_bytes());
+        b.extend_from_slice(&(name.len() as u32).to_le_bytes());
+        b.extend_from_slice(name.as_bytes());
+        b
+    }
+
+    #[test]
+    fn type_names_round_trip_and_are_ordinal_aligned() {
+        assert_eq!(NetworkType::TYPE_NAMES.len(), NT_COUNT);
+        for o in 0..NT_COUNT as u8 {
+            let nt = NetworkType::from_ordinal(o).expect("real ordinal");
+            assert_eq!(nt as u8, o, "discriminant == ordinal");
+            assert_eq!(nt.type_name(), NetworkType::TYPE_NAMES[o as usize]);
+            assert_eq!(NetworkType::from_type_name(nt.type_name()), Some(nt));
+        }
+        assert_eq!(NetworkType::from_ordinal(NT_COUNT as u8), None);
+        assert_eq!(NetworkType::from_type_name("NotAType"), None);
+        // The wire discriminant is decoupled from the enum name (kTypeNames).
+        assert_eq!(
+            NetworkType::from_type_name("SummLSTM"),
+            Some(NetworkType::LstmSummary)
+        );
+        assert_eq!(NetworkType::None.type_name(), "Invalid");
+    }
+
+    #[test]
+    fn parses_pre_registered_eng_lstm_outer_header() {
+        // The first-principles pre-registration: eng.lstm's outermost node
+        // (module docs) — Series, ni=36, no=111, num_weights=385807. Built here
+        // as the exact bytes Network::Serialize writes; the real-file parity is
+        // the network_dump example vs the libtesseract oracle.
+        let bytes = header_bytes("Series", 36, 111, 385807, "root");
+        let (h, consumed) = NetworkHeader::from_le_bytes(&bytes).expect("valid header");
+        assert_eq!(h.ntype, NetworkType::Series);
+        assert_eq!(h.ni, 36);
+        assert_eq!(h.no, 111);
+        assert_eq!(h.num_weights, 385807);
+        assert_eq!(h.name, "root");
+        assert_eq!(
+            consumed,
+            bytes.len(),
+            "base header consumes the whole prefix"
+        );
+        assert_eq!(h.dump(), "Series ni=36 no=111 num_weights=385807 name=root");
+    }
+
+    #[test]
+    fn header_sinks_onto_g6d2_facet_losslessly() {
+        let (h, _) = NetworkHeader::from_le_bytes(&header_bytes("LSTM", 48, 96, 55296, "L1"))
+            .expect("valid");
+        let f = h.to_facet();
+
+        // facet_classid = network_layer(0x0804) canon : LSTM(14) custom.
+        assert_eq!(classid_canon(f.facet_classid), NETWORK_LAYER);
+        assert_eq!(classid_custom(f.facet_classid), NetworkType::Lstm as u16);
+        assert_eq!(f.facet_classid, NetworkType::Lstm.classid());
+
+        // Read the tiers back under the operator's 6x8:8 (G6D2) shape.
+        let s = CascadeShape::G6D2;
+        assert_eq!(s.levels(), 2, "6x8:8 = 6 groups x 2 levels");
+        assert_eq!(f.tiers[0].as_u16(), 48, "tier0 = ni");
+        assert_eq!(f.tiers[1].as_u16(), 96, "tier1 = no");
+        assert_eq!(f.tiers[2].as_u16(), 192, "tier2 = network_flags low16");
+        // num_weights 55296 = 0x0000_D800 → lo=0xD800(55296), hi=0.
+        let nw = (f.tiers[3].as_u16() as u32) | ((f.tiers[4].as_u16() as u32) << 16);
+        assert_eq!(nw, 55296, "tiers 3-4 = num_weights u32");
+        assert_eq!(f.tiers[5].lo, 0, "training byte");
+        assert_eq!(f.tiers[5].hi, 0, "needs_backprop byte");
+
+        // The facet is exactly 16 bytes: classid(4) + 6x(8:8)=12.
+        assert_eq!(f.to_bytes().len(), 16);
+    }
+
+    #[test]
+    fn num_weights_high_half_survives_the_two_tiers() {
+        // A cumulative count above u16 (the eng.lstm root is 385807) round-trips
+        // through tiers 3-4 — the reason num_weights takes two 8:8 tiers.
+        let (h, _) = NetworkHeader::from_le_bytes(&header_bytes("Series", 36, 111, 385807, "r"))
+            .expect("ok");
+        let f = h.to_facet();
+        let nw = (f.tiers[3].as_u16() as u32) | ((f.tiers[4].as_u16() as u32) << 16);
+        assert_eq!(nw, 385807);
+        assert!(f.tiers[4].as_u16() > 0, "high half is non-zero for 385807");
+    }
+
+    #[test]
+    fn rejects_bad_tag_and_short_and_unknown() {
+        // Non-zero tag → BadTag.
+        let mut b = header_bytes("Series", 1, 1, 0, "x");
+        b[0] = 7;
+        assert_eq!(
+            NetworkHeader::from_le_bytes(&b),
+            Err(NetworkError::BadTag(7))
+        );
+
+        // Truncated mid-header → UnexpectedEof.
+        let full = header_bytes("Series", 1, 1, 0, "x");
+        assert_eq!(
+            NetworkHeader::from_le_bytes(&full[..10]),
+            Err(NetworkError::UnexpectedEof)
+        );
+
+        // Unknown type string → UnknownType.
+        let b = header_bytes("Frobnicate", 1, 1, 0, "x");
+        assert_eq!(
+            NetworkHeader::from_le_bytes(&b),
+            Err(NetworkError::UnknownType)
+        );
+
+        // Negative dim → NegativeDim.
+        let b = header_bytes("Series", -1, 1, 0, "x");
+        assert_eq!(
+            NetworkHeader::from_le_bytes(&b),
+            Err(NetworkError::NegativeDim)
+        );
+    }
+}
diff --git a/crates/lance-graph-contract/src/ogar_codebook.rs b/crates/lance-graph-contract/src/ogar_codebook.rs
index 8ebf7450..4f83a9aa 100644
--- a/crates/lance-graph-contract/src/ogar_codebook.rs
+++ b/crates/lance-graph-contract/src/ogar_codebook.rs
@@ -486,6 +486,12 @@ pub const CODEBOOK: &[(&str, u16)] = &[
     ("unicharset", 0x0801),
     ("recoder", 0x0802),
     ("charset", 0x0803),
+    // `network_layer` = the KIND "a Tesseract recognizer network layer" (Series /
+    // LSTM / Convolve / …). ONE container slot: the specific subclass is the
+    // classid's custom-low half (the `NetworkType` ordinal, `network::NETWORK_LAYER`),
+    // not 27 slots — the layer graph sinks onto `FacetCascade` tenants (the
+    // ruff→OGAR network harvest lands here).
+    ("network_layer", 0x0804),
     // ── 0x09XX — Health domain (MedCare; OGIT NTO/Healthcare promotion) ──
     ("patient", 0x0901),
     ("diagnosis", 0x0902),

From 4e14db01b00e23de19a1e6b8316cb3ef0c71326d Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Sat, 4 Jul 2026 14:45:59 +0000
Subject: [PATCH 7/7] =?UTF-8?q?board:=20E-OCR-FULLYCONNECTED-1=20=E2=80=94?=
 =?UTF-8?q?=20recognizer=20Leaf=204=20byte-parity=20green?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

FullyConnected::Forward (int8 path) = activation(WeightMatrix·input), the first
complete network layer, composing the two proven halves (Leaf 2 WeightMatrix +
Leaf 3 activations). Byte-parity green across all 7 activations + 2 shapes vs a
libtesseract oracle running the REAL MatrixDotVector+FuncInplace. Code lands in
tesseract-recognizer (the compute crate); board hygiene lands here per the
CLAUDE.md rule.

Co-Authored-By: Claude <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
---
 .claude/board/EPIPHANIES.md | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md
index 80d7447b..8bef0d94 100644
--- a/.claude/board/EPIPHANIES.md
+++ b/.claude/board/EPIPHANIES.md
@@ -177,6 +177,15 @@ New knowledge doc `.claude/knowledge/data-shape-etymology.md` — the shape-and-
 **Status:** FINDING (operator ruling on the shape — "yes valueschema") + embedded CONJECTURE (the preset-vs-dispatch probe)
 
 Operator floated keeping the fast/cheap V2 substrate for huge data alongside V3, "switched by classid," so V3 can eventually teach V2 how to be better. Resolved: the switch is NOT a new carrier. `ClassView::value_schema(classid) -> ValueSchema` (`canonical_node.rs:894`, `class_view.rs:395`) is ALREADY classid→substrate-shape resolution by trait dispatch — resolved, never stored on-wire (adding a variant costs NO `ENVELOPE_LAYOUT_VERSION` bump), and the four existing variants ALREADY form a substrate ladder: `Bootstrap`(empty, key+edges only) / `Compressed`(cold codec, **no hot lifecycle columns**) / `Cognitive`(hot thinking: Meta+Qualia+Fingerprint+Energy+Plasticity+EntityType) / `Full`(every tenant). So "V2 fast/cheap bulk" = classids that resolve to the LEAN end (Bootstrap/Compressed — no ownership/lifecycle tenants); "V3 witnessed/owned" = Cognitive/Full. **A `ClassRoutingDTO` is rejected:** a DTO is a serialized carried payload, but substrate choice is a RESOLUTION (firewall ADR-022, "contracts compile types, the event never leaves"); and per the three-tier canon nothing crosses mailbox boundaries — every reader re-resolves the substrate from the classid already in the 16-byte key, so there is no boundary for a carrier to travel. `dto-soa-savant` + AGI-as-glove name the new-struct-instead-of-resolution shape exactly. **0x1000 is NOT the switch:** canon fixes it as a temporary adoption MONITOR ("monitor, never a semantic"; retires at P4/100%; MODULE-TABLE flags that a future canon==0x1000 aliases the marker) — substrate routes on the classid's concept-half → ValueSchema, never on the monitor bit. **The deep form (CONJECTURE — PROBE preset-vs-dispatch):** the WRITE PATH may be a pure FUNCTION of the schema — a class whose ValueSchema carries no ownership/lifecycle tenants has nothing for the kanban/WAL to witness, so it naturally collapses to the fast private-merge write; Cognitive/Full carry the tenants that REQUIRE the owned/witnessed path. If that holds, substrate = ValueSchema full stop (no separate `Substrate` enum, no flag). The gate: confirm the write path is derivable from which tenants are live vs needing an independent resolution — evidence base is the onebrc arc itself (lane F private-merge/no-tenants vs lanes G–J owned/witnessed = the two write paths already measured). Open sub-question: whether bulk needs a variant leaner than `Compressed`, or Bootstrap/Compressed already suffice. **"V3 teaches V2" (deferred, needs mechanism):** V3's kanban WAL + ownership journal is the profiling signal (where contention lands, which fields are touched) to optimize the lean V2 layout — the instrumented-teacher / stripped-student loop; no code reads the WAL back into a layout optimizer yet. Net: at most a new `ValueSchema` variant through the existing `value_schema(classid)` door; possibly not even that.
+## 2026-07-04 — E-OCR-FULLYCONNECTED-1 — recognizer Leaf 4: `FullyConnected::Forward` (int8 path) is byte-parity green vs libtesseract — the first COMPLETE layer, the composition of the two proven halves
+**Status:** FINDING (byte-parity proven vs libtesseract 5.3.4; `tesseract-recognizer`, tested)
+
+The first COMPLETE network layer ships. `tesseract_recognizer::fully_connected_forward` transcodes `FullyConnected::ForwardTimeStep(const int8_t*, …)` (`fullyconnected.cpp:230-234`) — which is EXACTLY two operations in order with NO intermediate step: `weights_.MatrixDotVector(i_input, output_line)` (Leaf 2 `WeightMatrix::forward`) then `ForwardTimeStep(t, output_line)` (Leaf 3 activation, dispatched on the layer's `NetworkType`, `fullyconnected.cpp:203-219`). So Leaf 4 = `activation(W·u)`, composing the two independently-proven halves; what it NEWLY proves is the composition (order + no scaling/quant between matmul and activation).
+
+Byte-parity **GREEN** across all 7 activations (`tanh`/`logistic`/`relu`/`softmax`/`posclip`/`symclip`/`linear`) at 8×5 AND the larger 48×49 shape, diffing f32 bit-patterns. The oracle (`/tmp/fc_oracle.cpp`, built `-DFAST_FLOAT`) runs the REAL `WeightMatrix::MatrixDotVector` then the REAL `FuncInplace<GFunc/FFunc/ClipFFunc/ClipGFunc/Relu>` / `SoftmaxInPlace` — the exact two library calls `ForwardTimeStep` makes — so the diff is an independent proof of the composition, not a re-implementation. The `NT_POSCLIP→clip_f(clamp[0,1])` / `NT_SYMCLIP→clip_g(clamp[-1,1])` mapping was verified against `functions.h:85-95/124-134` (not swapped). +5 unit tests (18 total); clippy `-D warnings` + fmt clean (`-p tesseract-recognizer` scoped).
+
+Design detail: `FcActivation` (the 8 FullyConnected variants) is named LOCALLY in the compute crate — it does NOT depend on the Core's `NetworkType`; the boundary is the stable u8 ordinal (`FcActivation::from_network_type_ordinal`). This is the compute vocabulary of "which non-linearity," NOT a parallel network model (the graph structure stays in `lance_graph_contract::network`, E-OCR-NETWORK-SINK-1). **Next Leaf 5 = `LSTM::Forward`** (the gates CI/GI/GF1/GO + cell `c=clip(f·c+i·g,±100)` + `h=o·tanh(c)` recurrent state) — the recurrent counterpart, reusing `FullyConnected::Forward` for each gate. Then `Series`/`Parallel` graph walk → `recodebeam` → the code lattice `recoded_to_text` eats. Cross-ref: `E-OCR-WEIGHTMATRIX-1` (Leaf 2), `E-OCR-ACTIVATION-1` (Leaf 3), `E-OCR-MATDOTVEC-1` (Leaf 1), `E-OCR-NETWORK-SINK-1` (the structure side). Plan: `tesseract-rs/.claude/plans/recognizer-core-shape-v1.md` (Leaf 4 EXECUTED). Branch `claude/happy-hamilton-0azlw4`.
+
 ## 2026-07-04 — E-OCR-NETWORK-SINK-1 — the Tesseract `Network` layer graph sinks onto V3 SoA via ruff→OGAR: base-header parse byte-parity green + `FacetCascade` (16 B) sink, NOT a hand-rolled enum
 **Status:** FINDING (byte-parity proven vs libtesseract 5.3.4; `lance-graph-contract`, tested)