Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,28 @@ RUN git clone https://github.com/AdaWorldAPI/lance-graph.git \
&& git clone --depth 1 https://github.com/AdaWorldAPI/ndarray.git \
&& git clone --depth 1 https://github.com/AdaWorldAPI/neo4j-rs.git

# CPU baseline: x86-64-v4 (the 4th microarch level — AVX-512F/BW/CD/DQ/VL on top
# of v3's AVX2+FMA). This is the compile FLOOR; it flips on `target_feature =
# "avx512f"`, so q2-ndarray's `simd.rs` dispatch selects its native `simd_avx512`
# backend (`__m512`/`__m512d`/`__m512i`) instead of the v3 AVX2 default.
#
# BF16 + AMX 16x16 tile GEMM are NOT gated by this flag — they ride q2-ndarray's
# CPU-AGNOSTIC runtime autodetect polyfill (`simd_caps()` + the AMX `arch_prctl`
# XTILEDATA enable + CPU-model detect). The polyfill opportunistically lights them
# up only when the *runtime* host actually has them, and always keeps the AVX2 /
# scalar paths it compiled in as fallback. So: AVX-512 = compile baseline here;
# BF16/AMX = runtime-detected; everything below v4 = polyfill fallback.
#
# ⚠ REQUIREMENT: a v4 floor makes the binary REQUIRE AVX-512 at run time — it
# SIGILLs on the first `__m512` op on a host without it (the PR #170 failure mode,
# one level up). The Railway *build* machine needs no AVX-512 (compiling != run),
# but the *deploy* host does. AMX additionally needs a Sapphire/Emerald/Granite
# Rapids Xeon at run time; on anything older the autodetect simply skips AMX (that
# is the agnostic polyfill working as intended, not an error). If a deploy target
# may lack AVX-512, drop this to `x86-64-v3` and rely on runtime dispatch for the
# AVX-512/AMX paths — one portable binary, same hot paths when the silicon allows.
ENV CARGO_BUILD_RUSTFLAGS="-C target-cpu=x86-64-v4"

# Build the q2 binary with embedded frontend
WORKDIR /build/q2
RUN cargo build --release -p cockpit-server --features embed-cockpit,planner \
Expand Down
109 changes: 109 additions & 0 deletions claude-notes/plans/2026-06-24-fma-torso-bodyparts3d-splat.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,112 @@ Validates the design before wiring it into the render. Next increments:
(node_row-bounded + normal-oriented = crisp colours in the render)
- [ ] animation: deform node anchors -> motion-skinned gaussians follow
(Motion-Blender GS; the partonomy is the rig)

## Best shading + lazylock + adaptive-FPS + SPL4 (branch claude/torso-shading)

User: "best possible shading and lazylock buffering to mitigate batching", then
"adaptive framerate prediction + SIMD batching + v4", then the key insight: "the
Motion is fixed Rotation ... so it could easily prebuffer 270 frames for 90 FPS".
Scoping answers: framerate = BOTH (render-loop throttle now + codec P-frames as
the SPL4 motion track); PR scope = all of the above incl SPL4 in one push.

### Infra fact ("GitHub uses Cargo not Dockerfile?")
q2 CI = pure Cargo+npm (`cargo fmt`/`xtask lint`/`clippy -D warnings`/`nextest`,
wasm-pack/npm). The only `docker` in CI is `docker image prune` (free runner disk).
The root `/Dockerfile` is Railway-deploy ONLY (`q2-cockpit` embeds the Vite cockpit,
clones lance-graph/ndarray/neo4j for the graph hot path). This splat feature does
not touch the Dockerfile.
- [x] **Dockerfile CPU baseline -> x86-64-v4** (user ask): `ENV
CARGO_BUILD_RUSTFLAGS="-C target-cpu=x86-64-v4"` before the cockpit-server
build. Flips `target_feature="avx512f"` so q2-ndarray's `simd.rs` picks the
native `simd_avx512` backend. BF16+AMX tile GEMM ride ndarray's runtime
autodetect polyfill (`simd_caps()` + AMX arch_prctl/model-detect) — not gated
by the flag, lit only when the host has them, AVX2/scalar fallback always
compiled. ⚠ v4 = AVX-512 REQUIRED at runtime (SIGILL otherwise, the PR#170
mode one level up); AMX needs Sapphire/Emerald/Granite Rapids at runtime
(autodetect skips it otherwise = agnostic working as intended). Documented the
`x86-64-v3` fallback in the Dockerfile for non-AVX-512 deploy targets.

### Shading (the lit look) — DONE
- [x] Render driver (scratchpad, ndarray 1.95, OUT of q2 workspace): shade AT
RECONSTRUCTION from the per-vertex normal already in SPL2 — hemisphere ambient
(sky/ground) + key diffuse (n·L, L fixed in WORLD so camera orbits a still
light = consistent turntable) + soft fill. Shading MULTIPLIES the flat palette
colour, so the codec-free per-structure colour story is intact. 20-frame
shaded turntable rendered (9s/frame) → JPEG (67 KB/frame) →
cockpit/public/torso-frames/. Verified in-cockpit: volumetric depth, colours
preserved, no Warhol blob.

### Prebuffer = the answer to BOTH (A) and (B) [the user's insight]
The demo motion is a FIXED, periodic, deterministic camera rotation. So you neither
ADAPT the framerate nor PREDICT motion frame-by-frame — you PRECOMPUTE the closed
loop once and replay → every frame free → guaranteed 90 fps. This is exactly the
x265 GOP idea: a periodic camera path is a closed Group-of-Pictures; prebuffer the
GOP, replay forever. It is ALSO the honest SPL4 (B) motion source: the orbit is a
real known closed trajectory, so the 270 rotation steps ARE its P-frames — NO
synthetic breathing deformation needed (drop that demo).
- [ ] /torso turntable: bump FRAME_COUNT 20 → loop count over an exact 360° (frame
N == frame 0 for a seamless loop), 90 fps playback. Re-bake at the higher count
(background). Ship-size lever: 67 KB/frame × 270 ≈ 18 MB JPEG → offer WebM
encode (~3 MB) as the compaction. Mandatory here because CPU EWA splat is
9s/frame — live render impossible; prebuffer is THE technique, not an optim.
- note: the live WebGL points view is already real-time; prebuffering full
framebuffers there is VRAM-prohibitive (270×810×1080×4 ≈ 945 MB) — so the
live-view win is lazylock + adaptive-FPS, and image-prebuffer stays on /torso.

### Live views light up + lazylock + adaptive-FPS
- [ ] /torso-live (TorsoSplat) + /torso-map (TorsoMap): decode SPL2 `normal 3i8`
into an aNormal attribute (both skip it today); port hemisphere+diffuse+fill
into the FRAG. Same L → CPU frames and live WebGL agree.
- [ ] LazyLock build-once buffer: build geometry (pos+aColor+aNormal+aRow) ONCE;
mutate only via uniforms + draw-RANGE, never rebuild.
- [ ] Adaptive-FPS: EMA of rAF delta; over budget → shrink draw-range over the
Morton-ordered buffer (prefix = uniform spatial subsample) + drop pixelRatio;
recover when cheap; log active fraction (no silent decimation).

### SPL4 — ship the codec (static I-frame real, motion track reserved)
- [ ] `spl_codec.py`: WRITE a real `.spl4` (helix-Morton order, per-node anchor
I-frame, motion-from-anchor + zig-zag residual, anchor-predicted palette colour
= 0 per-gaussian bytes, normals). Header `motion_track_count` (0 static) reserves
the P-frame slot without a format bump (RESERVE-DON'T-RECLAIM).
- [ ] TS `decodeSpl4`: inverse — reconstruct pos/normal/rgb/row at load; all 3 views
switch to SPL4.
- [ ] Fold deferred #55 nits: `import math` → module top; fix "round-trips it"
docstring; TorsoMap `ray.params.Points` mutate-not-replace.
- [ ] (B) motion track = orbit-as-motion P-frames (above); ship the FORMAT slot +
decode contract; the camera trajectory is the demonstrator (honest, not faked).

### Verify + ship
- [ ] `cd cockpit && npm run build` (tsc clean); inspect shaded turntable + live
view; codec round-trip RMSE unchanged. Commit incrementally on
claude/torso-shading; ASK before push (GIT PUSH POLICY).

## v4 — is_a-PRIMARY whole-body anatomical atlas (major pivot, 2026-06-24)

Operator-driven pivot, several corrections of my assumptions:
1. **Use is_a, not part-of, for classification + names.** part-of is REGIONAL
(walk up a muscle -> chest wall -> thorax, never "muscular system") and its
names aren't canonical. is_a is the TYPE tree: every structure resolves up to
its canonical type (`pectoralis minor` -> ... -> `muscle organ`); is_a ships
canonical names; is_a's mesh set is a SUPERSET of part-of (2234 vs 1258 FJ,
+976) with finer organ segmentation (no single "aorta"/"heart" — split into
ascending/arch/descending/abdominal, each its own mesh). Downloaded the 142 MB
is_a obj package + the small is_a relation/name txts.
2. **container:identity / DN->GUID addressing.** tissue = walk the is_a TYPE tree
to the first type keyword (O(1), cached) = the DistinguishedName path, which
MATERIALISES to a numeric container:identity GUID (container = tissue class).
Stored per node: `tissue`, `is_a` (DN path, upper-ontology stripped),
`container`, `identity`, `guid`.
3. **Whole body is the goal — NO spatial torso filter.** Region focus (torso, an
organ) is a future SELECT -> CAMERA-ZOOM feature on the full-body splat, driven
O(1) by each node's centroid+bbox in the SoA, not a bake-time clip.
4. **Performance is the point.** Whole body = 602,341 gaussians / 1658 is_a
structures / 12.6 MB (414 arteries, 382 muscles, 221 veins, 203 bones, 126
nerves, full viscera). The deliberate load that motivates lazylock +
adaptive-FPS (live views) and the prebuffered turntable (CPU EWA).
- bake = `bake_torso_splat.py` v4 (is_a-primary). Tissue atlas palette + depth-peel
opacity. Driver orientation fixed (+90 about X; head was landing down).
- [ ] re-render upright whole-body turntable -> /torso; live views already decode
the unchanged SPL2 (extra nodes.json fields are ignored) — light them +
lazylock + adaptive-FPS to show + mitigate the 602K load.
- research: `claude-notes/research/2026-06-24-torso-anatomy-coverage-gap.md`.
Loading