Skip to content

DRAFT: POC reset-time variations#692

Draft
alexmillane wants to merge 25 commits into
mainfrom
alex/poc/sensitivity_analysis
Draft

DRAFT: POC reset-time variations#692
alexmillane wants to merge 25 commits into
mainfrom
alex/poc/sensitivity_analysis

Conversation

@alexmillane
Copy link
Copy Markdown
Collaborator

Summary

Proof of concept of reset-time variations.

Detailed description

  • Variations are a structured way introducing differences between simulation runs.
  • Reset time variations randomize some parameters per environment at reset time.

Copy link
Copy Markdown
Contributor

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📋 Code Review Summary

This DRAFT PR introduces a comprehensive reset-time variations system as a POC for sensitivity analysis. The implementation is well-architected with clear separation of concerns.

🏗️ Architecture Overview

The design follows a clean modular pattern:

  • Samplers (sampler.py): Stateless distribution abstractions with listener support for recording
  • Variations (variation_base.py): Knobs attached to assets, emitting EventTermCfg
  • Registry (variation_registry.py): Global name → class table for CLI resolution
  • Ledger (ledger.py): Sample recording layer for downstream sensitivity analysis
  • Hydra Integration: Dynamic schema generation for CLI overrides

✅ Strengths

  1. Clean abstraction boundaries: The variation-cfg split avoids circular references that trip configclass._validate
  2. Listener pattern: Samplers support observers, enabling the ledger to record values without tight coupling
  3. Hydra integration: Dynamic schema building via make_dataclass is elegant and avoids hardcoded field names
  4. Comprehensive documentation: The design docs (2026_04_13_sensitivity_analysis.md, etc.) provide excellent context
  5. Test coverage: Unit tests for split_hydra_overrides cover both happy and error paths

🔍 Suggestions for Improvement

1. Debug Print Statement

File: isaaclab_arena/variations/object_color.py (line ~302)

print("random_colors", random_colors)

This debug print should be removed before merging to main.

2. Commented-Out Code Cleanup

File: isaaclab_arena/variations/object_color.py (lines ~255-290)

Large blocks of commented-out code (make_variation_event, make_wrapper, wrap_callable_class) should either be converted to proper documentation explaining the alternatives considered, or removed. This creates noise for future maintainers.

3. Type Annotation Consistency

File: isaaclab_arena/variations/sampler.py

The _sample method lacks return type narrowing in the ABC:

@abstractmethod
def _sample(self, num_samples: int) -> torch.Tensor:
    ...

Consider adding a @property for event_shape to the ABC to enforce the interface.

4. Error Message Clarity

File: isaaclab_arena/variations/variation_base.py (line ~244)

# Not type-checked at runtime; assigning an incompatible cfg will only fail later

Consider adding a runtime type check since this is a public API boundary.

5. Missing ledger.py in Diff

The code references isaaclab_arena/variations/ledger.py and imports VariationLedger, but I don't see this file in the diff. Ensure it's included.

6. Device Handling

File: isaaclab_arena/variations/sampler.py

torch.rand(shape) creates tensors on CPU by default. Consider adding device parameter support:

def _sample(self, num_samples: int, device: str | torch.device = "cpu") -> torch.Tensor:

This will matter when running on GPU.

📝 Minor Nits

  1. Inconsistent rotation format: In compile_env_notebook.py, one pose uses rotation_wxyz and another rotation_xyzw — verify this is intentional
  2. Import organization: Some files have late imports inside functions (marked with # noqa: PLC0415) — consider consolidating where possible
  3. Magic strings: Event names like "{self.asset_name}_color_variation" could be centralized

🔒 Safety Considerations

  • ✅ Scene replication check is enforced (replicate_physics=False)
  • ✅ Hydra validation catches unknown override paths
  • split_hydra_overrides properly rejects non-Hydra tokens

📊 Test Recommendations

Before promoting from DRAFT:

  1. Add integration tests verifying per-env color divergence in simulation
  2. Add tests for the variation ledger recording flow
  3. Consider property-based tests for sampler bounds validation

Overall, this is a well-designed foundation for the sensitivity analysis feature. The separation between declarative config and runtime state is particularly clean. Addressing the debug print and commented code will be important for merge readiness.


Update (commit 34b4817):

Changes since previous review:

  • Suggestion #5 (Missing ledger.py): Fixed! isaaclab_arena/variations/ledger.py is now included in the diff.
  • 📝 Comprehensive docstring cleanup: Docstrings across the codebase have been trimmed to be more concise, aligning with the new "Prefer short docstrings" coding style documented in AGENTS.md.
  • 📝 Error handling style change: Several raise KeyError/ValueError/TypeError patterns replaced with assert statements, per the new coding style guidance.
  • 🗑️ Removed design docs: Planning docs (2026_04_21_*.md, 2026_05_11_hydra_variation_plan.md) and example files (hydra_cli_example.py, hydra_dynamic_schema_example.py, tint_events.py) have been cleaned up.

Remaining items from original review:

  • Suggestions #1-4 and #6 (debug print, commented-out code, type annotations, runtime type check, device handling) were not addressed in this commit — these remain as pre-merge considerations.

Update (commit 16cb604):

Good refactoring commit — extracts Hydra-related logic into a dedicated module.

Changes:

  • 🆕 New module variations/variations_hydra.py: Extracted build_schema(), load_cfg_from_flags(), and apply_overrides() from ArenaEnvBuilder
  • 🏗️ ArenaEnvBuilder simplified: Removed _iter_scene_variations, _populate_variation_ledger, _build_variations_schema, _asset_class_name; now uses thin wrappers around the new module
  • 🔄 Method renamed: compose_variations_cfgload_variations_cfg_from_flags
  • 📦 Scene.get_asset_variations(): New method returns (asset_name, variation) pairs, consolidating asset iteration
  • 📦 VariationLedger.attach_from_scene(): New convenience method for attaching variations from a scene
  • 🧹 Cleanup: Removed unused tint_events import from compile_env_notebook.py

No new issues introduced. Clean extraction with proper separation of concerns. The new module has comprehensive docstrings.

Remaining items from original review (unchanged):

  • Suggestions #1-4 and #6 still apply to object_color.py and sampler.py

Update (commit 1dd82d5):

Naming refactor commit — simplifies the variation system architecture.

Changes:

  • 🔄 Renamed VariationLedgerVariationRecorder: More descriptive name; ledger.pyvariation_recorder.py
  • 🗑️ Removed variation_registry.py: The global registry pattern has been removed entirely
  • 🗑️ Removed @register_variation decorator: No longer used on ObjectColorVariation
  • 📦 Updated __init__.py exports: Reflects the removed registry and renamed recorder

No new issues introduced. This is a clean naming/cleanup refactor that simplifies the architecture by removing the unused registry pattern.

Remaining items from original review (unchanged):

  • Suggestions #1-4 and #6 still apply to object_color.py and sampler.py

Update (commit 68abea0):

API polish commit — improves VariationRecorder usability and initialization flow.

Changes:

  • 🆕 summary() and __str__() methods: Added to both VariationRecord and VariationRecorder for human-readable output
  • 🔄 Renamed attach_from_scene()attach_to_scene(): More natural naming
  • 🏗️ Recorder initialization moved earlier: Now created and attached before IsaacLabArenaManagerBasedRLEnvCfg construction, passed as constructor arg rather than post-hoc assignment
  • 📝 Notebook simplified: Verbose loop replaced with print(env.unwrapped.cfg.variation_recorder) — leverages the new __str__ methods
  • 🧹 Docstrings trimmed: Removed "Thin wrapper around..." boilerplate comments

No new issues introduced. Clean improvements to API ergonomics.

Remaining items from original review (unchanged):

  • Suggestions #1-4 and #6 still apply to object_color.py and sampler.py

Update (commit fe45e7e):

Good refactoring commit — improves naming consistency and code organization.

Changes:

  • 🔄 Renamed SamplerSamplerBase: More explicit naming clarifies this is the abstract base class
  • 🔄 Renamed SamplerCfgSamplerBaseCfg: Consistent with the class rename
  • 📂 File split: sampler.py extracted into sampler_base.py (ABC) + uniform_sampler.py (concrete implementation) — cleaner separation
  • 🔄 Renamed variation_recorder.pyvariations_recorder.py: Plural form for consistency with variations_hydra.py
  • 🔄 Renamed config attr variation_recordervariations_recorder
  • 🔄 API change: Scene.get_asset_variations() now returns dict[str, list[VariationBase]] instead of list[tuple[str, VariationBase]] — more natural grouping by asset
  • 🗑️ Removed Scene.get_variations(): Superseded by the dict-based API

From original review:

  • Suggestion #1 (Debug print): Fixed! The print("random_colors", ...) line has been removed
  • Suggestion #2 (Commented-out code): Fixed! The large blocks of commented-out code in object_color.py have been cleaned up
  • 📝 Suggestion #3 (Type annotations): event_shape property still not in the ABC — remains optional improvement
  • 📝 Suggestion #6 (Device handling): Still uses CPU-only torch.rand() — remains optional improvement

No new issues introduced. This is a well-organized cleanup that improves code structure and naming consistency.


Update (commit 0ff7de9):

Major feature commit — introduces build-time variations and several new variation types.

New Architecture:

  • 🏗️ Two-flavor variation system: VariationBase now has two concrete flavors:
    • RunTimeVariationBase: Event-driven variations (e.g. color, camera decalibration) via build_event_cfg()
    • BuildTimeVariationBase: Pre-build sampling (e.g. HDR selection) via apply()
  • 🆕 CategoricalSampler: New sampler for discrete choices; returns actual items (not indices) for meaningful recorder output
  • 🆕 HDRImageVariation: Build-time variation that samples an HDR environment map for a DomeLight
  • 🆕 CameraDecalibrationVariation: Run-time variation adding calibration drift to cameras
  • 🧪 Comprehensive tests: test_build_time_variations.py with unit and simulation-app tests

API Changes:

  • build_event_cfg() no longer takes a scene parameter (cleaner interface)
  • SamplerBase.sample() return type loosened to Any (supports list from categorical sampler)
  • VariationRecorder now handles both tensor and non-tensor samples
  • ArenaEnvBuilder gains _apply_build_time_variations() method

Initialization Order:
The recorder is now attached before build-time variations are applied, ensuring HDR samples are captured correctly.

⚠️ New minor issue:

  • Notebook debug settings uncommented: In compile_env_notebook.py, headless=False and visualizer="kit" are now enabled — should be reverted (headless=True or commented) for CI compatibility.

From original review (unchanged):

  • 📝 Suggestion #3 (Type annotations): event_shape property still not in ABC
  • 📝 Suggestion #6 (Device handling): Still uses CPU-only torch.rand()

Update (commit b3e7086):

Architecture refinement — elevates variation support to the Asset base class.

Key Changes:

  • 🏗️ Variation methods moved to Asset: add_variation(), get_variation(), get_variations() lifted from ObjectBase to Asset base class. Embodiments now host variations natively since they inherit from Asset.
  • 🆕 ArenaEnvBuilder.get_all_variations(): New method aggregates variations from both scene assets and the embodiment into a unified dict.
  • 🔄 VariationRecorder.attach() refactored: Now accepts a dict[str, list[VariationBase]] directly; attach_to_scene() is now a thin wrapper.
  • 🧪 Notebook expanded: Demonstrates embodiment-level camera decalibration variation on droid_differential_ik.

Code Quality:

  • ⚠️ Commented-out code: droid.py has randomize_franka_joint_state = None with the original EventTerm block commented out. Consider removing the commented block for cleaner code.
  • 📝 Previous inline comment still applies: headless=False in compile_env_notebook.py should be True for CI.

No new blocking issues. This is a clean refactor enabling embodiment-level variations.

Remaining items from original review (unchanged):

  • 📝 Suggestion #3 (Type annotations): event_shape property still not in ABC
  • 📝 Suggestion #6 (Device handling): Still uses CPU-only torch.rand()

simulation_app = AppLauncher()
# headless=False enables the Kit viewer window so we can visually verify per-env
# randomizations (e.g. object tints). Set headless=True for CI / non-GUI runs.
args_cli = get_isaaclab_arena_cli_parser().parse_args([])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Note (0ff7de9): These debug settings (headless=False, visualizer="kit") are now uncommented. For CI compatibility, consider reverting to headless=True or commenting out before merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant