Direct .insv dual-fisheye → pinhole rig ingestion (skip the equirectangular intermediate) by kfarr · Pull Request #1 · 3DStreet/vid2scene

kfarr · 2026-06-11T18:43:24Z

Why

360° scenes processed through the equirectangular (ER) path show weak/missing ground, and both suspected causes are real and compound each other:

The ER projection stretches the poles. By the time Insta360 Studio has stitched the dual fisheyes into a 2:1 image, the ground directly under the camera is smeared across the entire bottom pixel row.
The ER virtual rig never looks down. pano_sfm.py renders 12 views at pitches (−35°, 0°, +35°) with 90° FOV — nothing below −80° pitch is covered, so the nadir gets zero direct observations.

This PR adds a path that crops virtual pinhole views straight out of the two raw fisheye sensor streams in an .insv file (the approach used by LichtFeld's fisheye mode):

Native sensor pixels everywhere — an equidistant fisheye has roughly uniform angular resolution, and the nadir sits ~90° off each lens axis, well inside a ~200° lens.
Default grid: 3×3 lens-local views at ±60° with 75° crops → 9 per lens, 18 per frame pair. The down-pitched views put the nadir ~30° from their axes, so the ground is covered by up to 6 views per frame pair instead of 0.
Insta360 Studio is no longer needed: upload the .insv directly.

What's in here

fisheye_projection.py — equidistant fisheye model (+ optional Kannala-Brandt k1–k4), MEI (unified omnidirectional) model for the factory calibration, lens-local view grids, rig rotations with per-lens mounting corrections, cv2.remap grid construction. Pure numpy/scipy, fully unit-tested.
insv_calibration.py — Insta360 per-unit factory calibration parsing: a minimal protobuf wire walker (no protobuf dependency) reads the trailer metadata record's offset_v3 (20 fields/lens, layout per telemetry-parser) and the .insv.pb sidecar's extended calibration (X5; 27 fields/lens incl. k4 + thin-prism, layout validated by insv-stitch against in-camera stitching), then rescales from the per-model reference resolution to the demuxed stream (window_crop_info-aware, centered aspect-fit fallback). Best-effort: any failure falls back to the idealized model.
insv_extract.py — ffmpeg demux of the three known .insv layouts (dual-stream single file; two-file _00_/_10_ pairs; side-by-side single stream with a dark-corner guard against mis-feeding ER video). Best-effort trailer parsing (ExifTool/Sub-Etha layout).
fisheye_sfm.py — renders the views with two mask sets (SfM masks = lens-validity ∩ closest-view partition, to avoid duplicate features in overlaps; training masks = validity only, so gsplat doesn't train on black corners), builds the 18-camera two-lens rig, runs the shared SfM. Lens intrinsics precedence: explicit --insv_calibration JSON > factory calibration > idealized model.
pano_sfm.py refactor — feature extraction → rig config → matching → mapping extracted into run_rig_sfm_pipeline(), shared by both pipelines. No behavior change to the ER path.
vid2scene.py — --insv_fisheye (auto-enabled for .insv inputs), --insv_lens_fov, --insv_calibration, --insv_no_factory_calibration. Image budget matches the ER path (~89 frame pairs × 18 ≈ 1,600 images at the 800 default).
Tests — 51 unit tests (vid2scene_core/tests/): projection round-trips with distortion, MEI projection math, nadir-coverage of the view grid, remap validity, trailer record walking, protobuf wire decoding, both factory-calibration layouts, X4/X5 reference-to-stream scaling, companion-file detection.
docs/insv_fisheye.md — usage, calibration resolution order, factory-calibration sources, JSON schema, limitations.

Validated

All 51 unit tests pass.
End-to-end smoke test on a synthetic dual-stream file with pycolmap 3.13.0 (the version the worker pins): demux → render → feature extraction → rig config → sequential matching → mapping all execute; 18 cameras / 1 rig / N frames land correctly in the COLMAP database.
Mask conventions verified empirically against pycolmap 3.13: masks are found under both name.png and name.png.png, and an image without a mask is skipped with MASK_ERROR once mask_path is set — the processor therefore writes a mask for every rendered image.
The refactored ER path was smoke-tested end-to-end (renders → SfM steps run unchanged).
Factory-calibration parsing is exercised against synthetic data matching the community-documented layouts (telemetry-parser's prost definitions; insv-stitch's X5 findings, which were validated against in-camera stitching output).

Needs validation on real footage (why this is a draft)

Factory calibration on real recordings. Parsing follows community-documented layouts but hasn't run against a real X4/X5 file yet. Specifically to confirm: the reference-resolution scaling on models other than X4/X5, and the sign convention of the sub-degree mounting corrections. --insv_no_factory_calibration gives an immediate A/B fallback to the idealized model (principal point at center, inscribed circle, 200° FOV).
Rear-lens mounting uses the factory yaw/pitch/roll corrections when available, otherwise exactly 180° yaw / 0° roll (overridable in the calibration JSON).
Lens baseline (~2–3 cm) is ignored, same as the ER path; the factory per-lens translation is parsed but not applied (observed as zeros in community dumps; a metric value would also pin reconstruction scale, which needs deliberate handling).
No IMU use yet (horizon leveling, rolling-shutter correction), no SAM3 ego-masking on the fisheye path, no GPS extraction from the trailer.
Server upload flow only exposes equirectangular; .insv uploads through the web UI need a form field + pass-through (auto-detection already works at the pipeline level). Same for the cog/Modal wrapper (separate repo).

Suggested A/B test

Same Bernal .insv clip three ways: (a) ER path as-is, (b) ER path with training_max_num_gaussians=3M, (c) this path. If (c) fills in the ground where (a)/(b) don't, the input geometry was the bottleneck, as suspected. With factory calibration now in, (c) can additionally be run with --insv_no_factory_calibration to isolate how much the calibration itself contributes.

🤖 Generated with Claude Code

Move feature extraction, rig configuration, sequential matching, and mapping into run_rig_sfm_pipeline so pipelines that render virtual perspective views from other sources (e.g. dual fisheye streams) can reuse the same SfM machinery. No behavior change for the equirectangular path. https://claude.ai/code/session_01MdiAmjGY3SEAQLHsxKBVac

Process Insta360 .insv recordings straight from their two raw fisheye sensor streams instead of requiring a pre-stitched equirectangular video. The equirectangular intermediate stretches the poles of the sphere, and the ER virtual rig's lowest pitch (-35 deg, 90 deg FOV) never sees below -80 deg, so the ground under the camera gets zero direct, full-resolution observations. Cropping pinhole views directly from each fisheye keeps native sensor pixels and the default view grid (3x3 at +/-60 deg per lens, 75 deg crops, 18 views per frame pair) covers the nadir with up to 6 views per frame pair. - fisheye_projection.py: equidistant fisheye model with optional Kannala-Brandt distortion terms, lens-local view grids, and cv2.remap grid construction (pure numpy/scipy, unit-tested) - insv_extract.py: ffmpeg demuxing of the three known .insv layouts (dual-stream, two-file _00_/_10_ pairs, side-by-side single stream) plus best-effort trailer metadata parsing for logging - fisheye_sfm.py: renders the virtual views with validity and closest-view partition masks, builds the two-lens rig config, and runs the shared rig SfM pipeline - vid2scene.py: --insv_fisheye / --insv_lens_fov / --insv_calibration flags, auto-enabled for .insv inputs - tests: 29 unit tests for the projection math and container parsing Lens intrinsics default to an idealized model; per-unit calibration can be supplied as JSON (docs/insv_fisheye.md). Factory calibration and IMU parsing from the trailer are follow-ups. https://claude.ai/code/session_01MdiAmjGY3SEAQLHsxKBVac

…ings Replace the idealized-lens assumption with Insta360's per-unit factory calibration whenever it can be read from the recording itself, removing the main registration risk flagged in the original PR. Every Insta360 camera embeds an MEI (unified omnidirectional) camera model per lens: mirror parameter xi, fx/fy/cx/cy, radial k1..k4, tangential p1/p2 and thin-prism s1..s4 distortion, plus sub-degree per-lens mounting corrections. Two sources are read, in order of preference: - the .insv.pb sidecar (X5; 27 fields per lens incl. k4 and thin-prism terms), layout validated by insv-stitch against in-camera stitching - the trailer metadata record's offset_v3 string (20 fields per lens), layout per telemetry-parser's prost definitions Implementation: - insv_calibration.py: minimal protobuf wire-format walker (no protobuf dependency), offset_v3 / .pb sidecar parsers, and reference-resolution to stream-resolution conversion (window_crop_info-aware, centered aspect-fit fallback). Best-effort throughout: any failure falls back to the idealized model. - fisheye_projection.MeiLensModel: the MEI forward projection with the same project_rays interface as the equidistant model; the FOV cone bounds validity since the projection itself accepts nearly all directions for xi >= 1. get_lens_from_rig_rotations moved here from fisheye_sfm and extended with per-lens mounting corrections (keeps tests pycolmap-free). - fisheye_sfm.py: precedence explicit JSON > factory > idealized; factory mounting corrections feed the rig config. - --insv_no_factory_calibration / --no_factory_calibration escape hatch in vid2scene.py and fisheye_sfm.py. - 22 new unit tests (51 total): wire decoding, both calibration string layouts, full-frame cx normalization, X4/X5 scaling cases, MEI projection math, rig corrections. Pending real-footage validation (documented in docs/insv_fisheye.md): reference scaling on models other than X4/X5, and mounting-correction sign conventions (sub-degree, so low risk either way). https://claude.ai/code/session_01MdiAmjGY3SEAQLHsxKBVac Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

kfarr · 2026-06-12T16:02:11Z

Downstream wiring to make this testable through the 3DStreet app (generator#splat → "360 → Splat (.insv)") is now in draft PRs: vid2scene-cog#1 (insv_fisheye pass-through + image pin cog-phase2 = this PR's head) and 3dstreet#1673 (model entry + splat-tab UI). Deploy order and the real-footage test plan are in both PR descriptions.

First validation against real X5 footage (fw v1.9.6_build1) found two gaps that silently dropped factory calibration to the idealized fallback: - The trailer chains an id-0 record at the top whose payload is an index table of (uint16 id, uint32 size, uint32 offset) entries, offsets relative to the trailer data start, small ids = legacy >> 8. Records below it no longer follow the strict payload+descriptor chain, so the walk now resolves everything through the table instead of treating id 0 as a terminator. - offset_v3 writes 19 fields per lens (no per-lens flag) plus trailing file-level values, vs the X4-era 20. parse_offset_v3 now tries both layouts and validates the reference-dimension slots, where a misaligned block lands lens_type/flag-scale values. With both fixes the real recording loads end-to-end from the trailer: principal points land within 5 px of the 3840x3840 stream center (full-frame cx shift + 5376->5312 window-crop scaling both verified), fx scales 4280->3094, and mount corrections surface the ~90 deg portrait-sensor roll confirmed by inspecting the demuxed frames - sub-degree-only assumptions would have broken this camera. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Validated against a real X4 recording (fw v1.9.21_build5): newer X4 firmware also writes the v3 index trailer and 19-field offset_v3 blocks, with a per-lens landscape 8000x6000 reference (no halving) and lens 1 cx in 16000-wide full-frame coordinates. Principal points land within ~5 px of the 2880x2880 stream center after normalization, and the demuxed frames confirm the same ~90 deg portrait-sensor roll as the X5. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The idealized-lens fallback is no longer silent: real X4/X5 sensors are portrait-mounted (~90 deg roll), which the idealized model doesn't know, so a job that silently degraded would burn a full SfM+training run on a rig that can't register. run_insv_sfm now raises FactoryCalibrationError before any heavy work (frame extraction, render, SfM, training) when no calibration parses; the idealized model remains available as an explicit opt-in (--insv_no_factory_calibration) or via a calibration JSON. load_factory_calibration now logs which rung of the ladder broke (no trailer / no file_info record / no offset_v3 / unparseable offset_v3) and, for the unparseable case, the raw offset_v3 string verbatim - that string is everything needed to add support for an unknown layout, as the X4-20-field vs X5-fw1.9-19-field split demonstrated. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

claude and others added 3 commits June 11, 2026 11:41

This was referenced Jun 12, 2026

Generator: 360 → Splat from raw .insv (direct dual-fisheye vid2scene path) 3DStreet/3dstreet#1673

Draft

insv_fisheye pass-through + upstream pin cog-phase2 (direct .insv dual-fisheye) 3DStreet/vid2scene-cog#1

Draft

kfarr and others added 3 commits June 12, 2026 12:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Direct .insv dual-fisheye → pinhole rig ingestion (skip the equirectangular intermediate)#1

Direct .insv dual-fisheye → pinhole rig ingestion (skip the equirectangular intermediate)#1
kfarr wants to merge 6 commits into
mainfrom
claude/insv-dual-fisheye-rig

kfarr commented Jun 11, 2026 •

edited

Loading

Uh oh!

kfarr commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kfarr commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What's in here

Validated

Needs validation on real footage (why this is a draft)

Suggested A/B test

Uh oh!

kfarr commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kfarr commented Jun 11, 2026 •

edited

Loading