Direct .insv dual-fisheye → pinhole rig ingestion (skip the equirectangular intermediate)#1
Draft
kfarr wants to merge 6 commits into
Draft
Direct .insv dual-fisheye → pinhole rig ingestion (skip the equirectangular intermediate)#1kfarr wants to merge 6 commits into
kfarr wants to merge 6 commits into
Conversation
Move feature extraction, rig configuration, sequential matching, and mapping into run_rig_sfm_pipeline so pipelines that render virtual perspective views from other sources (e.g. dual fisheye streams) can reuse the same SfM machinery. No behavior change for the equirectangular path. https://claude.ai/code/session_01MdiAmjGY3SEAQLHsxKBVac
Process Insta360 .insv recordings straight from their two raw fisheye sensor streams instead of requiring a pre-stitched equirectangular video. The equirectangular intermediate stretches the poles of the sphere, and the ER virtual rig's lowest pitch (-35 deg, 90 deg FOV) never sees below -80 deg, so the ground under the camera gets zero direct, full-resolution observations. Cropping pinhole views directly from each fisheye keeps native sensor pixels and the default view grid (3x3 at +/-60 deg per lens, 75 deg crops, 18 views per frame pair) covers the nadir with up to 6 views per frame pair. - fisheye_projection.py: equidistant fisheye model with optional Kannala-Brandt distortion terms, lens-local view grids, and cv2.remap grid construction (pure numpy/scipy, unit-tested) - insv_extract.py: ffmpeg demuxing of the three known .insv layouts (dual-stream, two-file _00_/_10_ pairs, side-by-side single stream) plus best-effort trailer metadata parsing for logging - fisheye_sfm.py: renders the virtual views with validity and closest-view partition masks, builds the two-lens rig config, and runs the shared rig SfM pipeline - vid2scene.py: --insv_fisheye / --insv_lens_fov / --insv_calibration flags, auto-enabled for .insv inputs - tests: 29 unit tests for the projection math and container parsing Lens intrinsics default to an idealized model; per-unit calibration can be supplied as JSON (docs/insv_fisheye.md). Factory calibration and IMU parsing from the trailer are follow-ups. https://claude.ai/code/session_01MdiAmjGY3SEAQLHsxKBVac
…ings Replace the idealized-lens assumption with Insta360's per-unit factory calibration whenever it can be read from the recording itself, removing the main registration risk flagged in the original PR. Every Insta360 camera embeds an MEI (unified omnidirectional) camera model per lens: mirror parameter xi, fx/fy/cx/cy, radial k1..k4, tangential p1/p2 and thin-prism s1..s4 distortion, plus sub-degree per-lens mounting corrections. Two sources are read, in order of preference: - the .insv.pb sidecar (X5; 27 fields per lens incl. k4 and thin-prism terms), layout validated by insv-stitch against in-camera stitching - the trailer metadata record's offset_v3 string (20 fields per lens), layout per telemetry-parser's prost definitions Implementation: - insv_calibration.py: minimal protobuf wire-format walker (no protobuf dependency), offset_v3 / .pb sidecar parsers, and reference-resolution to stream-resolution conversion (window_crop_info-aware, centered aspect-fit fallback). Best-effort throughout: any failure falls back to the idealized model. - fisheye_projection.MeiLensModel: the MEI forward projection with the same project_rays interface as the equidistant model; the FOV cone bounds validity since the projection itself accepts nearly all directions for xi >= 1. get_lens_from_rig_rotations moved here from fisheye_sfm and extended with per-lens mounting corrections (keeps tests pycolmap-free). - fisheye_sfm.py: precedence explicit JSON > factory > idealized; factory mounting corrections feed the rig config. - --insv_no_factory_calibration / --no_factory_calibration escape hatch in vid2scene.py and fisheye_sfm.py. - 22 new unit tests (51 total): wire decoding, both calibration string layouts, full-frame cx normalization, X4/X5 scaling cases, MEI projection math, rig corrections. Pending real-footage validation (documented in docs/insv_fisheye.md): reference scaling on models other than X4/X5, and mounting-correction sign conventions (sub-degree, so low risk either way). https://claude.ai/code/session_01MdiAmjGY3SEAQLHsxKBVac Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Author
|
Downstream wiring to make this testable through the 3DStreet app (generator#splat → "360 → Splat (.insv)") is now in draft PRs: vid2scene-cog#1 (insv_fisheye pass-through + image pin |
First validation against real X5 footage (fw v1.9.6_build1) found two gaps that silently dropped factory calibration to the idealized fallback: - The trailer chains an id-0 record at the top whose payload is an index table of (uint16 id, uint32 size, uint32 offset) entries, offsets relative to the trailer data start, small ids = legacy >> 8. Records below it no longer follow the strict payload+descriptor chain, so the walk now resolves everything through the table instead of treating id 0 as a terminator. - offset_v3 writes 19 fields per lens (no per-lens flag) plus trailing file-level values, vs the X4-era 20. parse_offset_v3 now tries both layouts and validates the reference-dimension slots, where a misaligned block lands lens_type/flag-scale values. With both fixes the real recording loads end-to-end from the trailer: principal points land within 5 px of the 3840x3840 stream center (full-frame cx shift + 5376->5312 window-crop scaling both verified), fx scales 4280->3094, and mount corrections surface the ~90 deg portrait-sensor roll confirmed by inspecting the demuxed frames - sub-degree-only assumptions would have broken this camera. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Validated against a real X4 recording (fw v1.9.21_build5): newer X4 firmware also writes the v3 index trailer and 19-field offset_v3 blocks, with a per-lens landscape 8000x6000 reference (no halving) and lens 1 cx in 16000-wide full-frame coordinates. Principal points land within ~5 px of the 2880x2880 stream center after normalization, and the demuxed frames confirm the same ~90 deg portrait-sensor roll as the X5. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The idealized-lens fallback is no longer silent: real X4/X5 sensors are portrait-mounted (~90 deg roll), which the idealized model doesn't know, so a job that silently degraded would burn a full SfM+training run on a rig that can't register. run_insv_sfm now raises FactoryCalibrationError before any heavy work (frame extraction, render, SfM, training) when no calibration parses; the idealized model remains available as an explicit opt-in (--insv_no_factory_calibration) or via a calibration JSON. load_factory_calibration now logs which rung of the ladder broke (no trailer / no file_info record / no offset_v3 / unparseable offset_v3) and, for the unparseable case, the raw offset_v3 string verbatim - that string is everything needed to add support for an unknown layout, as the X4-20-field vs X5-fw1.9-19-field split demonstrated. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
360° scenes processed through the equirectangular (ER) path show weak/missing ground, and both suspected causes are real and compound each other:
pano_sfm.pyrenders 12 views at pitches (−35°, 0°, +35°) with 90° FOV — nothing below −80° pitch is covered, so the nadir gets zero direct observations.This PR adds a path that crops virtual pinhole views straight out of the two raw fisheye sensor streams in an
.insvfile (the approach used by LichtFeld's fisheye mode):.insvdirectly.What's in here
fisheye_projection.py— equidistant fisheye model (+ optional Kannala-Brandt k1–k4), MEI (unified omnidirectional) model for the factory calibration, lens-local view grids, rig rotations with per-lens mounting corrections, cv2.remap grid construction. Pure numpy/scipy, fully unit-tested.insv_calibration.py— Insta360 per-unit factory calibration parsing: a minimal protobuf wire walker (no protobuf dependency) reads the trailer metadata record'soffset_v3(20 fields/lens, layout per telemetry-parser) and the.insv.pbsidecar's extended calibration (X5; 27 fields/lens incl. k4 + thin-prism, layout validated by insv-stitch against in-camera stitching), then rescales from the per-model reference resolution to the demuxed stream (window_crop_info-aware, centered aspect-fit fallback). Best-effort: any failure falls back to the idealized model.insv_extract.py— ffmpeg demux of the three known.insvlayouts (dual-stream single file; two-file_00_/_10_pairs; side-by-side single stream with a dark-corner guard against mis-feeding ER video). Best-effort trailer parsing (ExifTool/Sub-Etha layout).fisheye_sfm.py— renders the views with two mask sets (SfM masks = lens-validity ∩ closest-view partition, to avoid duplicate features in overlaps; training masks = validity only, so gsplat doesn't train on black corners), builds the 18-camera two-lens rig, runs the shared SfM. Lens intrinsics precedence: explicit--insv_calibrationJSON > factory calibration > idealized model.pano_sfm.pyrefactor — feature extraction → rig config → matching → mapping extracted intorun_rig_sfm_pipeline(), shared by both pipelines. No behavior change to the ER path.vid2scene.py—--insv_fisheye(auto-enabled for.insvinputs),--insv_lens_fov,--insv_calibration,--insv_no_factory_calibration. Image budget matches the ER path (~89 frame pairs × 18 ≈ 1,600 images at the 800 default).vid2scene_core/tests/): projection round-trips with distortion, MEI projection math, nadir-coverage of the view grid, remap validity, trailer record walking, protobuf wire decoding, both factory-calibration layouts, X4/X5 reference-to-stream scaling, companion-file detection.docs/insv_fisheye.md— usage, calibration resolution order, factory-calibration sources, JSON schema, limitations.Validated
name.pngandname.png.png, and an image without a mask is skipped with MASK_ERROR oncemask_pathis set — the processor therefore writes a mask for every rendered image.Needs validation on real footage (why this is a draft)
--insv_no_factory_calibrationgives an immediate A/B fallback to the idealized model (principal point at center, inscribed circle, 200° FOV).equirectangular;.insvuploads through the web UI need a form field + pass-through (auto-detection already works at the pipeline level). Same for the cog/Modal wrapper (separate repo).Suggested A/B test
Same Bernal
.insvclip three ways: (a) ER path as-is, (b) ER path withtraining_max_num_gaussians=3M, (c) this path. If (c) fills in the ground where (a)/(b) don't, the input geometry was the bottleneck, as suspected. With factory calibration now in, (c) can additionally be run with--insv_no_factory_calibrationto isolate how much the calibration itself contributes.🤖 Generated with Claude Code