From bf8402137e65df8a6f5095745fd3c73b17a9a73a Mon Sep 17 00:00:00 2001 From: AdaWorldAPI Date: Sun, 28 Jun 2026 22:42:36 +0200 Subject: [PATCH 1/3] docs: docs(adr): V3 as spine + polyglot transpiler (Rust/Python/C#) --- docs/V3-TRANSPILER-ADR.md | 125 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 125 insertions(+) create mode 100644 docs/V3-TRANSPILER-ADR.md diff --git a/docs/V3-TRANSPILER-ADR.md b/docs/V3-TRANSPILER-ADR.md new file mode 100644 index 0000000..f41b7de --- /dev/null +++ b/docs/V3-TRANSPILER-ADR.md @@ -0,0 +1,125 @@ +# ADR: V3 as the spine + the polyglot transpiler (Rust / Python / C#) + +**Status:** Proposed (RFC). Design contract; not yet implemented or +compile-verified. +**Date:** 2026-06-28 +**Context:** completes the "spine vs adapter" question left open by +`SURREAL-AST-AS-ADAPTER.md` + `SURREAL-AST-TRAP-PREFLIGHT.md`, and names the +transpiler superpower (re-emit the OGAR AST to any language via adapter). + +--- + +## Decision + +1. **V3 (the content-addressed rail record) is the spine.** SurrealQL / + ClickHouse / PostgreSQL / TTL DDL are demoted to **peer adapters** that + lower *from* V3 + `ClassView`. SurrealQL stops being a spine candidate. +2. **The V3 record is dual-mode and tenant-structured** (below). +3. **Codegen is an adapter family**: just as DDL adapters project the schema, + `LangBackend` adapters re-emit *source code* (Rust / Python / C#) from the + same IR. The IR is the interlingua; codegen is the transpiler. + +--- + +## 1 · The V3 record (the spine primitive) + +### 1.1 Dual-mode facet — `12 B = 96 bits = 6×16 = 4×24`, classid tags which +``` +FacetCascade { facet_classid: u32, payload: [u8; 12] } // 16 B, content address + + classid tag = Cascade → [FacetTier; 6] // 6 × (part_of:8, is_a:8) — POSITION (hierarchy) + classid tag = Triplet → [SpoTriple; 4] // 4 × (subject:8, pred:8, object:8) — LOCAL EDGES (graph) +``` +- Cascade = depth-with-implied-predicates (mereology:taxonomy); subsumption is + a bit-op. Triplet = breadth-with-explicit-predicates; **an SPO triple is a + triplet-mode facet**, which unifies the SPO corpus with the facet primitive + (today they are unjoined substrates). +- The tag rides in the classid (zero extra bytes; precedent: `TailVariant`). + +### 1.2 The 512-byte record = 32 tenants +``` +NodeRow 512 B ≡ [Facet; 32] (AoS row) + ≡ 32 tenants × [GUID; N] (SoA — "tenant" = a GUID member column) + tenant 0 Self GUID + tenant 1 Edges (EdgeBlock 12+4) + tenants 2..31 30 composition slots → GUID references to other classes +``` +`ClassView::tenant_schema(classid) -> [TenantRole; 32]`, **static per classid** +(keeps each tenant a homogeneous, SIMD-scannable GUID column). Roles: +`{ Self, Edges, Structural, Do, Think, Adapter }` (+ `nested`). The +`Do` (ActionDef / do-arm) · `Think` (cognitive plane) · `Adapter` (projection) +tenants are the three arms reached *through* the classid. Nesting = a +content-addressed FK column → a columnar composition DAG. + +> Reconciliation with current code: today `NodeRow` = `key(16) | edges(16) | +> value(480)` with `value` **opaque**. The `[Facet; 32]` / `tenant_schema` is +> the typed schema this ADR imposes on those same bytes — `ClassView` is the +> missing brick that turns the 480-byte slab into 30 typed tenant slots. + +### 1.3 Capacity is the SoC lint, not a limit +`>64 fields` · `>256/tier` · `>6 deep` · `>4 edges` · `>30 slots` → the class +lacks separation of concerns. **The encoding makes good SoC the only +representable shape**: overflow in any dimension is the signal; "reference +another class" (grow a limb) is always the fix. We own OGAR, so minting the +new limb is free and convergence keeps it shared. Detector and refactor are +the same mechanism. (The law is already written as a falsifier in +`ruff_spo_address/examples/medcare_probe.rs` §[G]; promote it to a `ruff` +diagnostic.) + +--- + +## 2 · The transpiler (the superpower) + +The IR (`ruff_spo_triplet::ModelGraph`) is already bidirectional +(`expand` ⇄ `reassemble`), and `ruff_cpp_codegen` already proves +`ModelGraph → Rust source`. Generalize that one backend into an adapter family: + +``` +SOURCE (py/cpp/cs) ─ruff_*_spo─▶ ModelGraph ─mint─▶ Facet (content address, dedup across langs) + │ +TARGET (py/rust/cs) ◀─LangBackend─── ModelGraph +``` +- `LangBackend { fn render(&self, &ModelGraph) -> String }` — one adapter per + target, peers of the DDL adapters. +- Rust ◀ `ruff_cpp_codegen` (exists) · Python ◀ extend `ruff_python_codegen` + (the formatter's generator) · C# ◀ new `ruff_csharp_codegen`. +- Content-addressing gives **cross-language dedup**: the same construct in + Python/C++/C# mints the same `Facet` (CI convergence test). + +### Honest boundary — structure transpiles, behaviour does not +`OGAR-AS-IR.md`: "the behavioural arm cannot survive lowering and stays in the +IR." The existing backend renders `MethodSig` *signatures*, not method bodies. +So the deliverable is a **schema / interface / DTO / ORM-model transpiler** +(API contracts, type defs, model shells) — enormous on its own. Full behaviour +transpilation (method bodies → executable logic) is a later arm via +`ActionDef` / `KausalSpec`, explicitly out of this ADR. + +--- + +## 3 · Consequences + +- **Positive:** one content-addressed spine; SurrealQL/DDL become honest peer + adapters (closes the trap); the SPO corpus and the facet primitive unify; + capacity-as-lint is enforced structurally; codegen-via-adapter gives polyglot + re-export; cross-app/cross-language dedup for free. +- **Costs / risks:** (1) the rail address is **lossy** — it is a CAM *key*, not + the content; the lossless shape lives in `ClassView` + the value tenants. + (2) **Minting governance** — a content address is only stable if the + rank-minter is frozen; the cross-language convergence test must be + CI-enforced *before* scaling. (3) **"Everything in OGAR" = OGAR is the fleet + bottleneck** — the zero-dep contract crate must be the only stable surface; + the `#[deprecated]` `*Bridge` churn already shows the strain. +- **Scale honesty:** the substrate is ~11 K nodes / ~24 K triples today, not + the aspirational 2 M; the `ruff_spo_triplet` per-language pipeline is the + lever that scales it. + +## 4 · Status of the pieces (verified `main`) +Real: `ModelGraph` interlingua (bidirectional), `ruff_cpp_spo` / `ruff_ruby_spo` +frontends, `ruff_csharp_spo` loader, the 16-byte mint (`ruff_spo_address`), one +backend (`ruff_cpp_codegen` → Rust), `bridge_codebook_convergence` (identity). +To build: Python→ModelGraph normalization, C# harvester generalization, the +`LangBackend` trait + Python/C# backends, the dual-mode `FacetMode`, the +`tenant_schema`, the round-trip + convergence CI, the `OGAR-SOC` lint. + +## 5 · Companion +Implementation plan: `ruff` PR "OGAR Polyglot AST Integration (RFC)". From 8dd45cdf7bdb4e9d03651a00f0010a631993a4f7 Mon Sep 17 00:00:00 2001 From: AdaWorldAPI Date: Sun, 28 Jun 2026 23:20:15 +0200 Subject: [PATCH 2/3] =?UTF-8?q?docs(adr):=20align=20bidirectional=20claim?= =?UTF-8?q?=20with=20ruff=20reality=20(reassemble=20C++-only;=20extract()?= =?UTF-8?q?=20is=20todo!())=20=E2=80=94=20mirrors=20ruff=20#32=20Codex=20f?= =?UTF-8?q?ixes?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/V3-TRANSPILER-ADR.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/docs/V3-TRANSPILER-ADR.md b/docs/V3-TRANSPILER-ADR.md index f41b7de..a38ab17 100644 --- a/docs/V3-TRANSPILER-ADR.md +++ b/docs/V3-TRANSPILER-ADR.md @@ -70,9 +70,11 @@ diagnostic.) ## 2 · The transpiler (the superpower) -The IR (`ruff_spo_triplet::ModelGraph`) is already bidirectional -(`expand` ⇄ `reassemble`), and `ruff_cpp_codegen` already proves -`ModelGraph → Rust source`. Generalize that one backend into an adapter family: +The IR (`ruff_spo_triplet::ModelGraph`) is bidirectional *by intent* — `expand` +is general, but **`reassemble` today recovers only the C++ projection** (a +general reassembler is a prerequisite; see the ruff RFC) — and +`ruff_cpp_codegen` already proves `ModelGraph → Rust source`. Generalize that +one backend into an adapter family: ``` SOURCE (py/cpp/cs) ─ruff_*_spo─▶ ModelGraph ─mint─▶ Facet (content address, dedup across langs) @@ -114,9 +116,11 @@ transpilation (method bodies → executable logic) is a later arm via lever that scales it. ## 4 · Status of the pieces (verified `main`) -Real: `ModelGraph` interlingua (bidirectional), `ruff_cpp_spo` / `ruff_ruby_spo` -frontends, `ruff_csharp_spo` loader, the 16-byte mint (`ruff_spo_address`), one -backend (`ruff_cpp_codegen` → Rust), `bridge_codebook_convergence` (identity). +Real: `ModelGraph` interlingua (`expand` general; `reassemble` C++-projection- +only today), `ruff_cpp_spo` (`extract_dir`/`extract_tree`; `extract()` is a +`todo!()`) / `ruff_ruby_spo` frontends, `ruff_csharp_spo` loader, the 16-byte +mint (`ruff_spo_address`), one backend (`ruff_cpp_codegen` → Rust), +`bridge_codebook_convergence` (identity). To build: Python→ModelGraph normalization, C# harvester generalization, the `LangBackend` trait + Python/C# backends, the dual-mode `FacetMode`, the `tenant_schema`, the round-trip + convergence CI, the `OGAR-SOC` lint. From 1ba2b0f4336b9f0cbe405a791ce04407508012b4 Mon Sep 17 00:00:00 2001 From: AdaWorldAPI Date: Sun, 28 Jun 2026 23:27:58 +0200 Subject: [PATCH 3/3] =?UTF-8?q?docs(adr):=20address=20Codex=20P1+P2=20?= =?UTF-8?q?=E2=80=94=20canon=20is=20key(16)+value(496),=20EdgeBlock=20supe?= =?UTF-8?q?rseded=20(F-5);=20tenant=20typing=20on=20Class=20not=20ClassVie?= =?UTF-8?q?w;=20cite=20=E2=89=A464/paginate-via-hierarchy=20canon?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/V3-TRANSPILER-ADR.md | 67 +++++++++++++++++++++++++-------------- 1 file changed, 43 insertions(+), 24 deletions(-) diff --git a/docs/V3-TRANSPILER-ADR.md b/docs/V3-TRANSPILER-ADR.md index a38ab17..c69c105 100644 --- a/docs/V3-TRANSPILER-ADR.md +++ b/docs/V3-TRANSPILER-ADR.md @@ -36,35 +36,54 @@ FacetCascade { facet_classid: u32, payload: [u8; 12] } // 16 B, content addres (today they are unjoined substrates). - The tag rides in the classid (zero extra bytes; precedent: `TailVariant`). -### 1.2 The 512-byte record = 32 tenants +### 1.2 The 512-byte record — canon is `key(16) + value(496)` ``` -NodeRow 512 B ≡ [Facet; 32] (AoS row) - ≡ 32 tenants × [GUID; N] (SoA — "tenant" = a GUID member column) - tenant 0 Self GUID - tenant 1 Edges (EdgeBlock 12+4) - tenants 2..31 30 composition slots → GUID references to other classes +NodeRow 512 B = key(16) | value(496) // OGAR canon (CLAUDE.md:51-52) + ≡ 32 × 16-byte slots ≡ 32 tenants × [GUID; N] ("tenant" = a GUID member column) + slot 0 Self GUID — the 16-byte key; never compressed, addressable with zero value decode + slots 1..31 31 value tenants → GUID references / facets ``` -`ClassView::tenant_schema(classid) -> [TenantRole; 32]`, **static per classid** -(keeps each tenant a homogeneous, SIMD-scannable GUID column). Roles: -`{ Self, Edges, Structural, Do, Think, Adapter }` (+ `nested`). The -`Do` (ActionDef / do-arm) · `Think` (cognitive plane) · `Adapter` (projection) -tenants are the three arms reached *through* the classid. Nesting = a -content-addressed FK column → a columnar composition DAG. - -> Reconciliation with current code: today `NodeRow` = `key(16) | edges(16) | -> value(480)` with `value` **opaque**. The `[Facet; 32]` / `tenant_schema` is -> the typed schema this ADR imposes on those same bytes — `ClassView` is the -> missing brick that turns the 480-byte slab into 30 typed tenant slots. +**No separate `EdgeBlock`.** The `12+4` EdgeBlock is **superseded** canon +(`NODEGUID-CANON-AUDIT.md` F-5; operator 2026-06-23: "don't use 12-4, that's +the old taxonomy before family nodes"). **Relations ARE the addressing** — a +shared family prefix is a local edge; a GUID reference to another node is a +cross edge. So "edges" are simply GUID-reference tenants, not a dedicated block. + +**Type info lives on `Class`, never on `ClassView`** (`CLASSVIEW-MATERIALIZATION- +PLAN.md` §3 + anti-pattern #2: `Class` carries types, `ClassView` is label-only; +"the right tool for codegen is `Class` directly"). So the tenant typing is an +**expansion of the existing `lance_graph_contract::Class`** — not a new type, +not on `ClassView`: +```rust +// lance-graph-contract — expand the existing Class; keep ClassView label-only +impl Class { + fn tenant_schema(&self) -> [TenantRole; 31]; // static per classid; SIMD-scannable columns +} +enum TenantRole { Structural, Edge, Do, Think, Adapter } // + nested: bool +``` +The `Do` (ActionDef / do-arm) · `Think` (cognitive plane) · `Adapter` +(projection) tenants are the three arms reached *through* the classid; nesting = +a content-addressed FK column → a columnar composition DAG. + +> Cross-repo divergence to reconcile (not here): lance-graph's +> `canonical_node.rs` still ships the superseded `key(16) | edges(16) | +> value(480)` with a `12+4` EdgeBlock. Per `NODEGUID-CANON-AUDIT.md` F-5 this is +> "a genuine canon-vs-operator divergence to resolve at the lance-graph level." +> This ADR follows the OGAR canon (`16 + 496`); the lance-graph-contract +> expansion reconciles `canonical_node.rs` against the family-node supersession +> in the same change. ### 1.3 Capacity is the SoC lint, not a limit -`>64 fields` · `>256/tier` · `>6 deep` · `>4 edges` · `>30 slots` → the class -lacks separation of concerns. **The encoding makes good SoC the only +`>64 fields` · `>256/tier` · `>6 deep` · `>4 edges` · `>31 value tenants` → the +class lacks separation of concerns. **The encoding makes good SoC the only representable shape**: overflow in any dimension is the signal; "reference -another class" (grow a limb) is always the fix. We own OGAR, so minting the -new limb is free and convergence keeps it shared. Detector and refactor are -the same mechanism. (The law is already written as a falsifier in -`ruff_spo_address/examples/medcare_probe.rs` §[G]; promote it to a `ruff` -diagnostic.) +another class" (grow a limb) is always the fix. This is **already canon** — +`CLASSVIEW-MATERIALIZATION-PLAN.md` §5: "No promoted class may have more than 64 +slots… if a future class crosses, **paginate via class hierarchy**" (enforced by +`field_basis_fits_in_one_u64_mask`). "Paginate via class hierarchy" *is* "grow a +limb via another class." Detector and refactor are the same mechanism. The law +is also written as a falsifier in `ruff_spo_address/examples/medcare_probe.rs` +§[G]; promote it to a registered `ruff` diagnostic (`OGAR-SOC`). ---