contract: Tesseract recoder + recognizer-leaf boards + network→V3-SoA sink (byte-parity)#643
Conversation
New zero-dep module lance_graph_contract::unicharcompress -- the load side of
Tesseract's UnicharCompress (ccutil/unicharcompress.{h,cpp}), the LSTM
recognizer's recoded-code <-> unichar-id table. First binary-format leaf: a
little-endian TFile reader (u32 count + per-RecodedCharID
[i8 self_normalized][i32 length][i32*length code]), then ComputeCodeRange
(max+1) and the decode map (last-writer-wins on a shared code). Load side only
(DeSerialize + Encode/Decode/code_range); ComputeEncoding + beam-search maps
are deferred to training/recognizer leaves.
Byte-parity GREEN on real eng.lstm-recoder: encode 112/112 + decode 112/112 +
code_range=111 (examples/recoder_dump.rs {encode,decode} diffed vs a
libtesseract 5.3.4 oracle; the 1012-byte size = 4 + 112*9 was derived before
the parse). Strict where C++ is UB: rejects length > kMaxCodeLen(9) and short
buffers.
+10 unit tests; clippy -D warnings + fmt clean (-p lance-graph-contract).
Board: EPIPHANIES E-CPP-PARITY-7, LATEST_STATE contract inventory. Resolves the
OGAR #148 recoder=0x0802 concept to its content-store module.
Co-Authored-By: Claude <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
…onto ndarray CONJECTURE (design-pass finding; byte-parity probe = recognizer Leaf 1). The OCR recognizer is COMPUTE (dense int8 GEMM), not content -- it consumes ndarray's existing matmul_i8_to_i32 / quantize / dequantize with no Core gap. int8->i32 is exact + bit-reproducible across AMX/VNNI/scalar. Corrects the "OCR is ndarray-free" framing. Cross-ref E-CPP-PARITY-7, the recognizer plan. Co-Authored-By: Claude <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
…otes seam FINDING) The int8 MatrixDotVector, via ndarray's matmul_i8_to_i32, equals libtesseract exactly on synthetic int8 (integer-combined diff, TFloat-agnostic). Promotes E-OCR-COMPUTE-NDARRAY-SEAM-1 CONJECTURE->FINDING. New crate tesseract-recognizer (compute tier). in-env libtesseract is FAST_FLOAT. Co-Authored-By: Claude <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
WeightMatrix::DeSerialize (int mode) transcoded + byte-parity vs libtesseract (f32 bit-patterns, two shapes). forward() chains Leaf 1's proven int8 GEMM, scaling in f32 to match FAST_FLOAT. Rust-writes / lib-reads independent proof. Co-Authored-By: Claude <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
The LUT activations (Tanh/Logistic + Relu/Clip/Softmax) transcoded + byte-parity vs libtesseract on a 4096-pt sweep; the regenerated tables match the baked ones. All f32 (FAST_FLOAT). Leaf 2 + Leaf 3 = the pieces of a FullyConnected forward. Co-Authored-By: Claude <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
…A (byte-parity) Executes the operator directive "6x8:8, 16 B tenant = classid + 12 B, ruff>OGAR transpiler sink-in". The polymorphic Network subclass tree lands on the OGAR Core the right way — NOT a hand-rolled enum (that draft was the parallel-object-model anti-pattern). - NEW src/network.rs: NetworkType (27 layer types; ordinal == on-wire kTypeNames discriminant, network.h:41-78 / network.cpp:60-75) + NetworkHeader::from_le_bytes (the base header Network::CreateFromFile reads before subclass dispatch, network.cpp:214-248) + to_facet() (sinks each node onto facet::FacetCascade, 16 B = classid + 6x8:8, CascadeShape::G6D2) + NetworkType::classid() (the invoke_network dispatch seed). facet_classid = compose_classid(network_layer, ntype) canon-high; subclass in the classid custom-low half, not 27 slots. - ogar_codebook: ONE mint network_layer=0x0804 in the 0x08 OCR domain. - NEW examples/network_dump.rs: the byte-parity surface. Byte-parity GREEN on real eng.lstm: Rust NetworkHeader::from_le_bytes == libtesseract Network::CreateFromFile for the outer node (Series ni=36 no=111 num_weights=385807 name=Series); the oracle's spec() == the model spec string (known-answer self-check, 5.5.0-hdr/5.3.4-lib ABI skew guarded, oracle built -DFAST_FLOAT). The facet 0x08040009 decodes losslessly. Reviewed by core-first-architect (TARGETS-CORE), v3-envelope-auditor (LAYOUT-CLEAN, no version bump), brutally-honest-tester (LAND). Folded in: compile-lock test (NETWORK_LAYER == codebook mint), custom-half invariant doc, to_facet debug_assert on the ni/no u16 range. +7 contract tests; clippy -D warnings + fmt clean (scoped -p lance-graph-contract). Board: EPIPHANIES E-OCR-NETWORK-SINK-1, LATEST_STATE contract inventory. Co-Authored-By: Claude <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
FullyConnected::Forward (int8 path) = activation(WeightMatrix·input), the first complete network layer, composing the two proven halves (Leaf 2 WeightMatrix + Leaf 3 activations). Byte-parity green across all 7 activations + 2 shapes vs a libtesseract oracle running the REAL MatrixDotVector+FuncInplace. Code lands in tesseract-recognizer (the compute crate); board hygiene lands here per the CLAUDE.md rule. Co-Authored-By: Claude <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
📝 WalkthroughWalkthroughAdds two new contract modules to lance-graph-contract: ChangesNetwork and recoder contract modules
Estimated code review effort: 3 (Moderate) | ~30 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
crates/lance-graph-contract/src/unicharcompress.rs (1)
210-219: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick winBound the pre-allocation to the available buffer.
countis only checked against the 50MMAX_ELEMENTScap, then passed straight toVec::with_capacity. A tiny hostile file (just the 4-bytecountheader declaring, say, 50M) forces a ~2 GB upfront allocation before the very firstRecodedCharId::readfails withUnexpectedEof. Each entry needs at least 5 bytes on the wire, so you can cheaply bound the reservation to what the buffer could actually contain. This matches the module's stated hostile-input hardening posture.♻️ Suggested bound
- let mut encoder = Vec::with_capacity(count as usize); + // Each entry is at least 5 bytes (i8 self_normalized + i32 length), so a + // declared count larger than the remaining buffer can hold is corrupt. + let max_possible = r.remaining() / 5; + let mut encoder = Vec::with_capacity((count as usize).min(max_possible));You'd add a small
remaining()helper toByteReader.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/lance-graph-contract/src/unicharcompress.rs` around lines 210 - 219, The pre-allocation in from_le_bytes currently trusts count after only checking MAX_ELEMENTS, so a small input can trigger a huge Vec::with_capacity before any RecodedCharId::read occurs. Add a ByteReader remaining() helper and use it in from_le_bytes to cap the reserved encoder size to the maximum number of entries that can fit in the available buffer, while keeping the existing RecoderError::TooManyElements guard and the RecodedCharId::read loop intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/lance-graph-contract/src/network.rs`:
- Around line 364-395: The issue is that `to_facet()` only enforces the
`ni`/`no` `u16::MAX` invariant with `debug_assert!`, so release builds can
silently truncate invalid values. Update `Network::to_facet` to perform a
production check and return a `Result` (or a dedicated `NetworkError`) when `ni`
or `no` exceed `u16::MAX`, and propagate that error from the caller paths
instead of constructing `FacetCascade` unconditionally. Keep the existing
`FacetCascade`/`tier_u16` mapping logic for valid values, but make the
out-of-line escape mentioned in the doc comment explicit and enforced in release
builds.
- Around line 264-277: `NetworkError` is currently a plain enum, so update it to
follow the crate’s existing snafu-based error pattern instead of relying on a
bare type. Add the appropriate snafu error derive/annotations to `NetworkError`,
define per-variant messages for `UnexpectedEof`, `BadTag`, `UnknownType`, and
`NegativeDim`, and make sure the type still supports standard error usage
through the generated `Display` and `std::error::Error` behavior. Use
`NetworkError` and its variants in `network.rs` as the main anchor when updating
the error definition.
In `@crates/lance-graph-contract/src/unicharcompress.rs`:
- Around line 290-300: The compute_code_range method can overflow when it sets
self.code_range to max + 1 after scanning self.encoder for raw code values. Add
validation or a checked/saturating increment so hostile i32::MAX codes do not
panic in debug or wrap in release, and handle the invalid input consistently
with the existing BadCodeLength/UnexpectedEof corruption checks. Keep the fix
localized to compute_code_range and its code_range assignment.
---
Nitpick comments:
In `@crates/lance-graph-contract/src/unicharcompress.rs`:
- Around line 210-219: The pre-allocation in from_le_bytes currently trusts
count after only checking MAX_ELEMENTS, so a small input can trigger a huge
Vec::with_capacity before any RecodedCharId::read occurs. Add a ByteReader
remaining() helper and use it in from_le_bytes to cap the reserved encoder size
to the maximum number of entries that can fit in the available buffer, while
keeping the existing RecoderError::TooManyElements guard and the
RecodedCharId::read loop intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 0ac97b88-4e98-4885-af9b-7a69775ac83d
📒 Files selected for processing (8)
.claude/board/EPIPHANIES.md.claude/board/LATEST_STATE.mdcrates/lance-graph-contract/examples/network_dump.rscrates/lance-graph-contract/examples/recoder_dump.rscrates/lance-graph-contract/src/lib.rscrates/lance-graph-contract/src/network.rscrates/lance-graph-contract/src/ogar_codebook.rscrates/lance-graph-contract/src/unicharcompress.rs
| #[derive(Debug, Clone, Copy, PartialEq, Eq)] | ||
| pub enum NetworkError { | ||
| /// The buffer ended before the base header was fully read. | ||
| UnexpectedEof, | ||
| /// The `tag` byte was not `NT_NONE`(0) — an unversioned/foreign blob | ||
| /// (`getNetworkType` only branches into the string path when `tag == 0`). | ||
| BadTag(i8), | ||
| /// The `type_name` string did not match any [`NetworkType::TYPE_NAMES`] | ||
| /// entry (`getNetworkType`'s `data == NT_COUNT` path). | ||
| UnknownType, | ||
| /// A negative dimension (`ni`/`no`/`num_weights` are non-negative for any | ||
| /// serialized model). | ||
| NegativeDim, | ||
| } |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
NetworkError doesn't follow the crate's snafu error pattern.
NetworkError is a bare enum with no Display/std::error::Error impl at all (its sibling RecoderError in unicharcompress.rs at least hand-rolls Display/Error, still not via snafu). The coding guidelines call for reusing snafu error patterns for Rust error types in this crate.
♻️ Suggested snafu-based error
-#[derive(Debug, Clone, Copy, PartialEq, Eq)]
-pub enum NetworkError {
- /// The buffer ended before the base header was fully read.
- UnexpectedEof,
- /// The `tag` byte was not `NT_NONE`(0) — an unversioned/foreign blob
- /// (`getNetworkType` only branches into the string path when `tag == 0`).
- BadTag(i8),
- /// The `type_name` string did not match any [`NetworkType::TYPE_NAMES`]
- /// entry (`getNetworkType`'s `data == NT_COUNT` path).
- UnknownType,
- /// A negative dimension (`ni`/`no`/`num_weights` are non-negative for any
- /// serialized model).
- NegativeDim,
-}
+#[derive(Debug, Clone, Copy, PartialEq, Eq, snafu::Snafu)]
+pub enum NetworkError {
+ #[snafu(display("network header buffer ended before it was fully read"))]
+ UnexpectedEof,
+ #[snafu(display("network header tag {tag} was not NT_NONE(0)"))]
+ BadTag { tag: i8 },
+ #[snafu(display("network header type name did not match any known NetworkType"))]
+ UnknownType,
+ #[snafu(display("network header contained a negative dimension"))]
+ NegativeDim,
+}As per coding guidelines, crates/**/*.rs: "reuse snafu error patterns" — this applies to the new error type here.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #[derive(Debug, Clone, Copy, PartialEq, Eq)] | |
| pub enum NetworkError { | |
| /// The buffer ended before the base header was fully read. | |
| UnexpectedEof, | |
| /// The `tag` byte was not `NT_NONE`(0) — an unversioned/foreign blob | |
| /// (`getNetworkType` only branches into the string path when `tag == 0`). | |
| BadTag(i8), | |
| /// The `type_name` string did not match any [`NetworkType::TYPE_NAMES`] | |
| /// entry (`getNetworkType`'s `data == NT_COUNT` path). | |
| UnknownType, | |
| /// A negative dimension (`ni`/`no`/`num_weights` are non-negative for any | |
| /// serialized model). | |
| NegativeDim, | |
| } | |
| #[derive(Debug, Clone, Copy, PartialEq, Eq, snafu::Snafu)] | |
| pub enum NetworkError { | |
| #[snafu(display("network header buffer ended before it was fully read"))] | |
| UnexpectedEof, | |
| #[snafu(display("network header tag {tag} was not NT_NONE(0)"))] | |
| BadTag { tag: i8 }, | |
| #[snafu(display("network header type name did not match any known NetworkType"))] | |
| UnknownType, | |
| #[snafu(display("network header contained a negative dimension"))] | |
| NegativeDim, | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/lance-graph-contract/src/network.rs` around lines 264 - 277,
`NetworkError` is currently a plain enum, so update it to follow the crate’s
existing snafu-based error pattern instead of relying on a bare type. Add the
appropriate snafu error derive/annotations to `NetworkError`, define per-variant
messages for `UnexpectedEof`, `BadTag`, `UnknownType`, and `NegativeDim`, and
make sure the type still supports standard error usage through the generated
`Display` and `std::error::Error` behavior. Use `NetworkError` and its variants
in `network.rs` as the main anchor when updating the error definition.
Source: Coding guidelines
| #[inline] | ||
| #[must_use] | ||
| pub fn to_facet(&self) -> FacetCascade { | ||
| // ni/no are the semantic dims that MUST round-trip; every real eng.lstm dim | ||
| // is < 65536, but a hypothetical wider model would truncate here silently. | ||
| // Fail loudly in debug (mirrors the CANON mint-path `debug_assert`); a real | ||
| // out-of-range dim is the trigger to add an out-of-line escape. `ni`/`no` are | ||
| // non-negative (`NegativeDim` is rejected in `from_le_bytes`). `network_flags` | ||
| // is a bitmask whose low-16 is the documented projection, not a dim, so it is | ||
| // deliberately not asserted. The prefix-routing redouts (`hi_distance` etc.) | ||
| // are NOT meaningful across the tiers-3/4 `num_weights` split — this facet is | ||
| // read as 6× concatenated-`u16`, not as `hi`/`lo` prefix chains. | ||
| debug_assert!( | ||
| (self.ni as u32) <= u16::MAX as u32 && (self.no as u32) <= u16::MAX as u32, | ||
| "network ni/no exceeds u16 — needs an out-of-line escape (network.rs::to_facet)" | ||
| ); | ||
| let nw = self.num_weights as u32; | ||
| FacetCascade { | ||
| facet_classid: self.ntype.classid(), | ||
| tiers: [ | ||
| tier_u16(self.ni as u32 as u16), | ||
| tier_u16(self.no as u32 as u16), | ||
| tier_u16(self.network_flags as u32 as u16), | ||
| tier_u16((nw & 0xFFFF) as u16), | ||
| tier_u16((nw >> 16) as u16), | ||
| FacetTier { | ||
| lo: self.training as u8, | ||
| hi: u8::from(self.needs_backprop), | ||
| }, | ||
| ], | ||
| } | ||
| } |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Silent truncation of ni/no in release builds.
to_facet() guards the ni/no ≤ u16::MAX invariant only with debug_assert!, which is compiled out in release builds. If from_le_bytes is ever fed a header where ni/no exceed u16::MAX (e.g. a corrupted or unexpected future model), release builds will silently truncate the values into the facet with no error signal — a data-integrity gap for what's meant to be a byte-parity contract surface.
Consider returning a Result (or a dedicated NetworkError variant) instead of relying on debug_assert! for this invariant, so the out-of-line escape mentioned in the doc comment is actually enforced in production.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/lance-graph-contract/src/network.rs` around lines 364 - 395, The issue
is that `to_facet()` only enforces the `ni`/`no` `u16::MAX` invariant with
`debug_assert!`, so release builds can silently truncate invalid values. Update
`Network::to_facet` to perform a production check and return a `Result` (or a
dedicated `NetworkError`) when `ni` or `no` exceed `u16::MAX`, and propagate
that error from the caller paths instead of constructing `FacetCascade`
unconditionally. Keep the existing `FacetCascade`/`tier_u16` mapping logic for
valid values, but make the out-of-line escape mentioned in the doc comment
explicit and enforced in release builds.
| fn compute_code_range(&mut self) { | ||
| let mut max = -1_i32; | ||
| for entry in &self.encoder { | ||
| for &c in entry.codes() { | ||
| if c > max { | ||
| max = c; | ||
| } | ||
| } | ||
| } | ||
| self.code_range = max + 1; | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
max + 1 can overflow on hostile code values.
Code values are read as raw i32 with no upper-bound validation (unlike length). A single entry with a code of i32::MAX makes max + 1 overflow — panic in debug, wrap to i32::MIN in release. Given the module explicitly guards BadCodeLength/UnexpectedEof against corrupt input, this path deserves the same treatment.
🛡️ Proposed fix
- self.code_range = max + 1;
+ self.code_range = max.saturating_add(1);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn compute_code_range(&mut self) { | |
| let mut max = -1_i32; | |
| for entry in &self.encoder { | |
| for &c in entry.codes() { | |
| if c > max { | |
| max = c; | |
| } | |
| } | |
| } | |
| self.code_range = max + 1; | |
| } | |
| fn compute_code_range(&mut self) { | |
| let mut max = -1_i32; | |
| for entry in &self.encoder { | |
| for &c in entry.codes() { | |
| if c > max { | |
| max = c; | |
| } | |
| } | |
| } | |
| self.code_range = max.saturating_add(1); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/lance-graph-contract/src/unicharcompress.rs` around lines 290 - 300,
The compute_code_range method can overflow when it sets self.code_range to max +
1 after scanning self.encoder for raw code values. Add validation or a
checked/saturating increment so hostile i32::MAX codes do not panic in debug or
wrap in release, and handle the invalid input consistently with the existing
BadCodeLength/UnexpectedEof corruption checks. Keep the fix localized to
compute_code_range and its code_range assignment.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4e14db01b0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if tag != 0 { | ||
| return Err(NetworkError::BadTag(tag)); |
There was a problem hiding this comment.
Accept ordinal-encoded network headers
For legacy/order-coded network blobs, Tesseract's getNetworkType treats a non-zero first byte as the NetworkType ordinal and continues reading the rest of the header without a type-name string. Rejecting every non-zero byte here means NetworkHeader::from_le_bytes fails on network files that Network::CreateFromFile still accepts, so the Rust byte-parity loader is narrower than the C++ reader for those serialized models.
Useful? React with 👍 / 👎.
| tier_u16(self.ni as u32 as u16), | ||
| tier_u16(self.no as u32 as u16), |
There was a problem hiding this comment.
Avoid silently truncating wide layer dimensions
When a valid custom network has ni or no above u16::MAX, release builds skip the debug_assert! and these casts wrap the dimensions into the facet. That silently corrupts the SoA projection for those models even though the header format stores the dimensions as i32; this should either return an error or preserve the overflow out-of-line before constructing a FacetCascade.
Useful? React with 👍 / 👎.
…perator ruling)
Operator ruling 2026-07-04 ("mark all as migration mandatory"): the V1
contiguous-u24 node-key tail (family:u24 ++ identity:u24) is forbidden and its
migration to the V3 6×(u8:u8) facet is mandatory on every surface — upgrading the
le-contract §L7 #2 reconciliation from optional to a hard mandate.
- ISSUES.md ISS-V1-U24-TAIL-MIGRATION-MANDATORY: the full residue enumerated with
file:line (ocr.rs:121, soa_graph.rs:412, aiwar.rs:104, action.rs:417/693,
callcenter graph_table + OWL bytes[13..16] writers, ogar lib.rs:195, and the
CLAUDE.md CANON doc), each mandatory, each gated per-site on v3-envelope-auditor.
Records the mechanism (no new_v3 constructor; classid tail_variant resolves V3)
and the gotcha (NodeGuid::new byte-packing does NOT align with the V3 reading —
classid swap alone is insufficient). Test-only fold assertions exempt.
- EPIPHANIES.md E-V3-V1-U24-MIGRATION-MANDATORY: the ruling as policy.
Confirmed already V3-clean (no action): the Tesseract transcode arc
(contract::network FacetCascade #643, recoder, tesseract-recognizer, ruff
harvest) + OGAR render_class_with_methods (#150) — zero contiguous-u24.
Board-only; no code, no build step.
Co-Authored-By: Claude <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
The Core-side of the post-#633 Tesseract-transcode plateau: the recoder content-store leaf, the
Networklayer-graph → V3-SoA sink, and the board record for the recognizer compute leaves (whose code lives in the companion tesseract-rs PR, per the board-hygiene rule). All additive tolance-graph-contract(zero-dep); noNodeRow/ValueTenant/ValueSchema/stride/ENVELOPE_LAYOUT_VERSIONimpact.What ships (7 commits)
contract::unicharcompress(2f1df8d5) — the LSTM recoder load side (UnicharCompress+RecodedCharId,from_le_bytes= C++DeSerialize;encode/decode/code_range). The FIRST binary-format leaf (TFileLE). Byte-parity GREEN 112 enc + 112 dec on realeng.lstm-recoder. EPIPHANIESE-CPP-PARITY-7.contract::network(a7dba3a8) — the operator directive "6x8:8, 16-byte tenant = classid + 12 bytes, ruff→OGAR sink-in", executed the right way (NOT a hand-rolled enum).NetworkType(27 layer types, ordinal == on-wirekTypeNames) +NetworkHeader::from_le_bytes(the base headerNetwork::CreateFromFilereads,network.cpp:214-248) +to_facet()→facet::FacetCascade(16 B = classid + 6×8:8,CascadeShape::G6D2);facet_classid = compose_classid(network_layer=0x0804, ntype)canon-high (ONE OCR-domain mint; the 27 subclasses live in the classid custom-low, not 27 slots). Byte-parity GREEN vs libtesseractNetwork::CreateFromFileon realeng.lstm(Series ni=36 no=111 num_weights=385807; oraclespec()== the model spec string). Reviewed by core-first-architect (TARGETS-CORE), v3-envelope-auditor (LAYOUT-CLEAN, no version bump), brutally-honest-tester (LAND); their advisories folded in (compile-lock testNETWORK_LAYER == codebook mint, custom-half invariant doc,to_facetni/nodebug_assert). EPIPHANIESE-OCR-NETWORK-SINK-1.Recognizer-leaf boards (
ba5ce72f,856358a2,4af9162d,c60d8f55,4e14db01) — EPIPHANIESE-OCR-{COMPUTE-NDARRAY-SEAM,MATDOTVEC,WEIGHTMATRIX,ACTIVATION,FULLYCONNECTED}-1, the byte-parity record for the recognizer compute crate (Leaves 1-4). The code lands in the companion tesseract-rs PR; the boards land here per the workspace hygiene rule ("recognizer boards land in lance-graph").Proof / gates
-DFAST_FLOAT): recoder 112+112, network base-header vsNetwork::CreateFromFile.-p lance-graph-contract: network tests green;clippy -D warnings+ fmt clean. Rebased onto currentmain(post-onebrc/lane-j: typed GridlakeCarrierError (addresses #641 review) #642, +29 commits) with thecompose_classid/facet/canonical_concept_idsurface verified intact.contract::network+contract::unicharcompressinventory +network_layer=0x0804codebook row).Merge order
Merge this FIRST. The companion tesseract-rs PR (recoder consumer surface + recognizer Leaves 1-4 + network docs) builds its
lance-graph-contractpath dep against lance-graphmain, so its CI is red until this merges. The ruff PR (harvest_network— the ruff→OGAR harvester that produced the network manifest) is independent.🤖 Generated with Claude Code
https://claude.ai/code/session_016b33swuXE23hKtqxsHu9p1
Generated by Claude Code
Summary by CodeRabbit
New Features
Bug Fixes