Skip to content
64 changes: 64 additions & 0 deletions .claude/board/EPIPHANIES.md

Large diffs are not rendered by default.

12 changes: 11 additions & 1 deletion .claude/board/LATEST_STATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,16 @@

---

## 2026-07-04 — branch `claude/happy-hamilton-0azlw4` — `contract::network` — the Tesseract `Network` layer graph sunk onto V3 SoA via ruff→OGAR (byte-parity vs libtesseract)

**NEW** `lance_graph_contract::network`: `NetworkType` (27 layer types, ordinal == on-wire `kTypeNames` discriminant) + `NetworkHeader` (`from_le_bytes` = the base header `Network::CreateFromFile` reads before subclass dispatch: `i8 tag | u32+str type_name | i8 training | i8 needs_backprop | i32 flags | i32 ni | i32 no | i32 num_weights | u32+str name`) + `to_facet()` (the V3 SoA sink) + `NetworkType::classid()` (the `invoke_network` dispatch seed). Executes the operator directive *"6x8:8, 16 B tenant = classid + 12 B, ruff>OGAR sink-in"*: (1) the `ruff_cpp_spo` `harvest_network` example (committed to ruff) walks the 11 network headers via libclang → the `has_function`/`virtually_overrides` SPO manifest (62 classes, 5060 triples) = the `classid → ClassView` method-resolution table, NOT a hand-rolled enum; (2) each node sinks onto `crate::facet::FacetCascade` (16 B = `classid(4) | 6×(8:8)`, read `CascadeShape::G6D2`): tier0=ni, tier1=no, tier2=flags, tiers3-4=num_weights u32, tier5=lifecycle; `facet_classid = compose_classid(network_layer=0x0804, ntype)` canon-high. Byte-parity **GREEN** on real `/tmp/eng.lstm`: Rust parse == libtesseract `Network::CreateFromFile` — `Series ni=36 no=111 num_weights=385807 name=Series` — oracle `spec()` == the model spec string (known-answer self-check, 5.5.0-hdr/5.3.4-lib ABI skew guarded). Example `network_dump.rs`; +5 contract tests; clippy `-D warnings` + fmt clean (scoped `-p lance-graph-contract`). ONE `network_layer`=0x0804 OCR-domain mint added (subclasses in classid custom-low, not 27 slots). Deferred: per-subclass payload + tree recursion, the `invoke_network` keystone, the recognizer COMPUTE leaves. Refs: EPIPHANIES `E-OCR-NETWORK-SINK-1`; plan `tesseract-rs/.claude/plans/network-ruff-ogar-sink-v1.md`. Not yet a PR.

## 2026-07-04 — branch `claude/happy-hamilton-0azlw4` — `contract::unicharcompress` — the Tesseract recoder load side (byte-parity vs libtesseract)

**NEW** `lance_graph_contract::unicharcompress`: `UnicharCompress` (the LSTM recoder's code↔id table) + `RecodedCharId` + `RecoderError`, load side only (`from_le_bytes` / `load_from_file` = C++ `DeSerialize`; `encode` / `decode` / `code_range`; `dump_encode` / `dump_decode` parity surfaces). The FIRST binary-format leaf (`TFile` little-endian: `u32 count` + per-entry `[i8 self_normalized][i32 length][i32×length code]`). Byte-parity **GREEN** on real `/tmp/eng.lstm-recoder` — encode 112/112 + decode 112/112 + code_range=111 — via the committed `examples/recoder_dump.rs`, diffed vs a libtesseract 5.3.4 oracle (the 5.5.0-header ABI skew self-validated by the `Encode∘Decode` round-trip + `enc_size=112`). +10 contract tests; `-p lance-graph-contract` clippy `-D warnings` + fmt clean. Consumed by `tesseract-core::{Recoder, recoded_to_text}` (codes→decode→ids→`ids_to_text`; +1 boundary test, 8/8). Resolves the `recoder`=0x0802 concept (OGAR #148 mint, mirrored in the "0x08XX OCR rows" line below) to its content-store module. The recoder keystone (`invoke_recoder`) is UNBLOCKED but deferred (dispatch already proven generically by E-CPP-KEYSTONE-1). Refs: EPIPHANIES `E-CPP-PARITY-7`. Not yet a PR.

---

## 2026-06-23 — IN PR (`claude/medcare-bridge-lance-graph-wmx76z`) — ActionHandler⟷RBAC⟷orchestration spine

`contract::rbac`: `ScopeSpec` (axis-3 Copy token) + `ClassRbac` §4 default methods (`roles_reaching`/`row_scope`/`field_mask`; backward-compat, probe green). `contract::class_view::FieldMask::union`. `contract::action::ActionInvocation::commit_via<R: ClassRbac>` (no-admin-bypass convergence of the inline gate). `lance-graph-rbac::{authorize_scoped, ScopedDecision}` (§5 two-stage). `lance-graph-ogar::{OgarRbac<S: GrantSource>, GrantSource}` (Q5 local newtype, §6 evaporation seam). rs-graph-llm: `graph-flow-kanban::{run_cycle, CycleOutcome}` + `graph-flow-action::dispatch_via`. Plan: integration-actionhandler-rbac-orchestration-v1.
Expand Down Expand Up @@ -693,5 +703,5 @@ PR sequence: #360 → #361 → post-#360 substrate-sweep (this PR).

- **`codegen_spine::RouteBucketTyped`** (NEW; C6 merged verbatim from op-nexgen's vendored diff, codex-reviewed on nexgen PR #8). Kind-generic sibling of `RouteBucket` (`type Kind: Copy + Eq`) + `?Sized` blanket bridge (`impl<T: RouteBucket + ?Sized> RouteBucketTyped for T { type Kind = OdooMethodKind; }`) so non-Odoo codegen targets bring their own kind enum additively. Coherence rule: a type needing a different Kind skips the legacy trait. 12/12 module tests incl. dyn-object coverage.
- **`emission_scan`** (NEW; op-nexgen L2). Zero-dep typed-DDL adoption counter, `classid_scan`'s design-language sibling: `TypedForm {Typed, AnyTyped, RecordLink, Stub}` (#[non_exhaustive]) + tokenizer `classify_ddl_type` (precedence Stub > RecordLink > AnyTyped > Typed; word-boundary tokens so `many`/`recording` never false-match) + `EmissionCounts` fold with `typed_ratio()` (f64, mirrors `adoption_pct`). 15 tests. Module doc NAMES the contract scan-family pattern (Form enum + classify_* + fold-to-counts): the next governance counter mirrors it.
- **`ogar_codebook` 0x08XX OCR rows** — `unicharset` (0x0801) / `recoder` (0x0802) / `charset` (0x0803) mirroring OGAR #148's mint (container kinds only; content never becomes concepts — Osint zero-rows precedent). Drift-guard test extended. CODEBOOK now 68 entries.
- **`ogar_codebook` 0x08XX OCR rows** — `unicharset` (0x0801) / `recoder` (0x0802) / `charset` (0x0803) / `network_layer` (0x0804) mirroring OGAR #148's mint (container kinds only; content never becomes concepts — Osint zero-rows precedent). `network_layer` = the KIND "a Tesseract recognizer network layer"; the 27 subclasses live in the classid custom-low half (`NetworkType` ordinal), NOT 27 slots. Drift-guard test extended. CODEBOOK now 69 entries.
- **Rulings + intake record:** EPIPHANIES E-V3-XSESSION-INTAKE-1(+RULINGS), E-V3-GRAPHRAG-INV-1; handover `.claude/handovers/2026-07-02-cross-session-wishlist-intake.md`; plan Addendum-10/11 (per-consumer classid ownership + tripwires ratified; R-1 naming phantom closed — `domain:appid:classview`; R-2 closed — 512-byte row frozen, edges via strided view; L3 new-Arrow-schema design killed; five post-fuse workstreams enumerated). Knowledge: `graphrag-rs-inventory.md`.
56 changes: 56 additions & 0 deletions crates/lance-graph-contract/examples/network_dump.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
//! Dump the base `Network` header at the front of a serialized recognizer
//! component (`eng.lstm`) — the Rust side of the network base-header byte-parity
//! leaf, sibling to `recoder_dump`. Also prints the [`FacetCascade`] the node
//! sinks onto (the ruff→OGAR harvest → V3 SoA target).
//!
//! ```sh
//! # Extract the lstm component (starts with the network, lstmrecognizer.cpp:135):
//! combine_tessdata -u $(dpkg -L tesseract-ocr-eng | grep eng.traineddata) /tmp/eng.
//! # C++ oracle (network_spec_oracle.cpp): links libtesseract, calls the REAL
//! # Network::CreateFromFile on the same bytes and dumps the loaded top node's
//! # type / ni / no / num_weights / name + spec() (the known-answer self-check).
//! # ./network_spec_oracle /tmp/eng.lstm > /tmp/oracle_network.txt
//! # Rust side (parses only the base header — the shared prefix of every layer):
//! cargo run -p lance-graph-contract --example network_dump -- /tmp/eng.lstm > /tmp/rust_network.txt
//! # The "header:" line is byte-identical between the two => the base header
//! # parse is byte-parity green.
//! ```

#![allow(
clippy::print_stdout,
reason = "a dump CLI example writes to stdout by design"
)]

use std::process::ExitCode;

use lance_graph_contract::network::NetworkHeader;

fn main() -> ExitCode {
let Some(path) = std::env::args().nth(1) else {
eprintln!("usage: network_dump <path/to/eng.lstm>");
return ExitCode::FAILURE;
};
let bytes = match std::fs::read(&path) {
Ok(b) => b,
Err(err) => {
eprintln!("error reading {path}: {err}");
return ExitCode::FAILURE;
}
};
match NetworkHeader::from_le_bytes(&bytes) {
Ok((header, consumed)) => {
// The byte-parity line (diffed against the oracle's loaded top node).
println!("header: {}", header.dump());
// The V3 SoA sink: the 16-byte FacetCascade (classid + 6×8:8), hex.
let f = header.to_facet();
let hex: String = f.to_bytes().iter().map(|b| format!("{b:02x}")).collect();
println!("facet: classid={:#010x} bytes={hex}", f.facet_classid);
println!("consumed: {consumed} bytes (base header; subclass payload follows)");
ExitCode::SUCCESS
}
Err(err) => {
eprintln!("error parsing header: {err:?}");
ExitCode::FAILURE
}
}
}
51 changes: 51 additions & 0 deletions crates/lance-graph-contract/examples/recoder_dump.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
//! Dump a `.lstm-recoder`'s encoder table (`encode`) or decode round-trip
//! (`decode`) — the Rust side of the recoder byte-parity leaf, sibling to
//! `unicharset_dump`.
//!
//! ```sh
//! # on a box with libtesseract + libleptonica installed:
//! combine_tessdata -u $(dpkg -L tesseract-ocr-eng | grep eng.traineddata) /tmp/eng.
//! # C++ oracle (recoder_oracle.cpp): loads the SAME component via TFile and dumps
//! # EncodeUnichar / DecodeUnichar / code_range. It also prints, per id, the
//! # UNICHARSET bijection + an Encode.Decode round-trip so the NEW UnicharCompress
//! # object layout self-validates against the 5.5.0-header / 5.3.4-lib ABI skew.
//! # ./recoder_oracle /tmp/eng.lstm-unicharset /tmp/eng.lstm-recoder encode > /tmp/oracle_recoder_encode.tsv
//! # ./recoder_oracle /tmp/eng.lstm-unicharset /tmp/eng.lstm-recoder decode > /tmp/oracle_recoder_decode.tsv
//! # Rust side:
//! cargo run -p lance-graph-contract --example recoder_dump -- /tmp/eng.lstm-recoder encode > /tmp/rust_recoder_encode.tsv
//! cargo run -p lance-graph-contract --example recoder_dump -- /tmp/eng.lstm-recoder decode > /tmp/rust_recoder_decode.tsv
//! diff /tmp/oracle_recoder_encode.tsv /tmp/rust_recoder_encode.tsv \
//! && diff /tmp/oracle_recoder_decode.tsv /tmp/rust_recoder_decode.tsv
//! # both byte-identical => the recoder load-side is byte-parity green
//! ```

#![allow(
clippy::print_stdout,
reason = "a dump CLI example writes to stdout by design"
)]

use std::path::Path;
use std::process::ExitCode;

use lance_graph_contract::unicharcompress::UnicharCompress;

fn main() -> ExitCode {
let Some(path) = std::env::args().nth(1) else {
eprintln!("usage: recoder_dump <path/to/eng.lstm-recoder> [encode|decode]");
return ExitCode::FAILURE;
};
let mode = std::env::args().nth(2).unwrap_or_default();
match UnicharCompress::load_from_file(Path::new(&path)) {
Ok(recoder) => {
match mode.as_str() {
"decode" => print!("{}", recoder.dump_decode()),
_ => print!("{}", recoder.dump_encode()),
}
ExitCode::SUCCESS
}
Err(err) => {
eprintln!("error: {err}");
ExitCode::FAILURE
}
}
}
3 changes: 3 additions & 0 deletions crates/lance-graph-contract/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,8 @@ pub mod manifest;
pub mod mul;
pub mod nan_projection;
pub mod nars;
/// LSTM `Network` layer-graph structure — base-header parse + `FacetCascade` sink.
pub mod network;
pub mod ocr;
/// D-OVC-1 — OGAR concept codebook (`0xDDCC` domain layout), wire-compat mirror.
pub mod ogar_codebook;
Expand Down Expand Up @@ -133,6 +135,7 @@ pub mod tax;
pub mod tenant_counter;
pub mod thinking;
pub mod unichar;
pub mod unicharcompress;
pub mod unicharset;
pub mod unicharset_adapter;
pub mod view_angle;
Expand Down
Loading
Loading