Skip to content

flatten: heal degenerate cells so SLIM stops NaNing / crashing#988

Open
SuperOptimizer wants to merge 1 commit into
ScrollPrize:mainfrom
SuperOptimizer:investigate-slim-flatten-nan
Open

flatten: heal degenerate cells so SLIM stops NaNing / crashing#988
SuperOptimizer wants to merge 1 commit into
ScrollPrize:mainfrom
SuperOptimizer:investigate-slim-flatten-nan

Conversation

@SuperOptimizer

Copy link
Copy Markdown
Contributor

Problem

Full-res SLIM flattening of some segments failed hard: -nan energy at iter 1 (and, once that was worked around, a PaStiX heap corruption / SIGSEGV during factorization). It happened at every decimation level, so decimation never rescued it.

Root cause is degenerate input geometry the tracer emits: grid-adjacent points collapsed onto the same (or sub-voxel) 3D location. A healthy sheet keeps adjacent cells ~1/scale voxels apart; the repro segment (20230702185753_v11, 3.9M verts) had ~6k coincident adjacent points scattered in ~2,500 tiny clusters. These make zero-area triangles → infinite symmetric-Dirichlet energy → singular system → NaN.

Fix (three layers, all geometry-side)

  1. healDegenerateCells() (core, next to inpaintSurfaceHoles): detects collapsed grid cells, drops the connected cluster (dilated by a ring) so the inpaint gets a clean interior hole with healthy boundary, and refills via inpaintSurfaceHoles, iterating a few passes. Threshold is relative to 1/scale so it adapts across segments. Optional dropped_mask out-param lets callers clear the same cells in other channels (approval, etc.). Repairs ~70k cells on the repro.

  2. Scale-relative emit guard in vc_tifxyz2obj: a few fully-collapsed 2×2 blocks can't be separated locally, and even one zero-area triangle re-NaNs SLIM. The guard skips any triangle with a collapsed edge or sliver altitude → 0 zero-area triangles reach the OBJ (worst case: a sub-cell hole the flattener tolerates).

  3. Two-pass emit in vc_tifxyz2obj: skipping triangles can leave a vertex referenced by no surviving face. Such an orphan is an all-zero row/col in L = AᵀWA, which crashes the PaStiX Cholesky (the "double free / SIGSEGV" that looked like a solver bug). Now we collect surviving triangles first, then emit only the vertices they use — no orphans.

Plus a one-line latent fix in the Pastix6LLT wrapper: spmInit leaves spm.replicated = -1 ("uninitialized"); set it to 1 (single local matrix) so PaStiX doesn't take distributed-matrix branches.

Verification

20230702185753_v11: was -nan at iter 1 at every decimation. Now flattens cleanly via the VC3D SLIM panel (GUI, decimated path, new segment created) and via CLI at keep=2/6/25 — 0 orphans, 0 zero-area triangles, converged L2 ≈ 1.0, valid output OBJ.

Notes

  • The tracer is the upstream source of these collapsed cells (it initializes new points at a neighbor's position and can accept them without a min-spacing check). Fixing the tracer is a separate, larger change; this PR makes the flatten pipeline robust to the defects regardless.
  • Full-res (3.9M verts) may still OOM on low-RAM hosts — that's a genuine memory limit, not a crash; decimate as needed.

…IM stops NaNing

Full-res (and even lightly-decimated) SLIM flattening of some segments
failed: -nan at iter 1, or a PaStiX heap corruption / SIGSEGV. Root
cause was a chain of three issues, all stemming from degenerate input
geometry the tracer emits:

1. Collapsed grid cells. The grower can leave grid-ADJACENT points at
   the same (or sub-voxel) 3D location (~6k scattered cells in the repro
   segment). Healthy adjacent cells are ~1/scale voxels apart. These make
   zero-area triangles whose symmetric-Dirichlet energy is infinite ->
   singular system -> NaN at iter 1.
   Fix: healDegenerateCells() (core/InpaintSurface) drops connected
   clusters of collapsed cells (dilated by a ring) and refills them via
   inpaintSurfaceHoles, iterating a few passes. Threshold is relative to
   1/scale so it adapts across segments. Optional dropped_mask lets
   callers clear the same cells in other channels (approval, etc.).

2. Unrepairable stragglers. A few fully-collapsed 2x2 blocks can't be
   separated locally, and even one zero-area triangle re-NaNs SLIM.
   Fix: a scale-relative emit guard in vc_tifxyz2obj skips any triangle
   with a collapsed edge or sliver altitude -> guarantees 0 zero-area
   triangles reach the OBJ (worst case: a sub-cell hole).

3. Orphan vertices. Skipping triangles can leave a vertex referenced by
   no surviving face. An orphan is an all-zero row/col in L = AtWA, which
   crashes the PaStiX Cholesky (double free / SIGSEGV in factorization,
   surfacing as an Eigen SparseMatrix free). This is what looked like a
   solver bug.
   Fix: vc_tifxyz2obj now emits in two passes -- collect surviving
   triangles first, then emit only the vertices they use. No orphans.

Also set spm.replicated = 1 in the Pastix6LLT wrapper: spmInit leaves it
-1 ("uninitialized"), which steers PaStiX down distributed-matrix code
paths for our single local matrix. Latent correctness fix.

Verified on 20230702185753_v11 (3.9M verts, ~6k collapsed cells): was
-nan at iter 1 at every decimation; now flattens cleanly via the VC3D
GUI and via CLI at keep=2/6/25 (0 orphans, converged L2 ~1.0, valid OBJ).
@vercel

vercel Bot commented May 27, 2026

Copy link
Copy Markdown

Someone is attempting to deploy a commit to the scroll Team on Vercel.

A member of the Team first needs to authorize it.

@giorgioangel

Copy link
Copy Markdown
Member

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7285d5fb5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +195 to +196
const double scale = (std::isfinite(sc[0]) && sc[0] > 0.f) ? sc[0] : 0.0;
const int healed = vc::core::util::healDegenerateCells(points, scale);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Respect the y scale when healing cells

For surfaces whose metadata has non-square scale (scale[0] != scale[1]), this passes only the x scale into healDegenerateCells, whose 4-neighbor checks use that single expected spacing for both horizontal and vertical edges. When the y spacing is legitimately smaller than 1/scale[0], healthy vertical neighbors can be classified as collapsed and dropped/inpainted before export, potentially corrupting large anisotropic segments; the emit-time expected_step below repeats the same x-only threshold. Please carry both scale components through the degeneracy checks or use direction-specific thresholds.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants