DRAFT: POC reset-time variations#692
Conversation
…d to convert to a variation.
There was a problem hiding this comment.
📋 Code Review Summary
This DRAFT PR introduces a comprehensive reset-time variations system as a POC for sensitivity analysis. The implementation is well-architected with clear separation of concerns.
🏗️ Architecture Overview
The design follows a clean modular pattern:
- Samplers (
sampler.py): Stateless distribution abstractions with listener support for recording - Variations (
variation_base.py): Knobs attached to assets, emittingEventTermCfg - Registry (
variation_registry.py): Global name → class table for CLI resolution - Ledger (
ledger.py): Sample recording layer for downstream sensitivity analysis - Hydra Integration: Dynamic schema generation for CLI overrides
✅ Strengths
- Clean abstraction boundaries: The variation-cfg split avoids circular references that trip
configclass._validate - Listener pattern: Samplers support observers, enabling the ledger to record values without tight coupling
- Hydra integration: Dynamic schema building via
make_dataclassis elegant and avoids hardcoded field names - Comprehensive documentation: The design docs (
2026_04_13_sensitivity_analysis.md, etc.) provide excellent context - Test coverage: Unit tests for
split_hydra_overridescover both happy and error paths
🔍 Suggestions for Improvement
1. Debug Print Statement
File: isaaclab_arena/variations/object_color.py (line ~302)
print("random_colors", random_colors)This debug print should be removed before merging to main.
2. Commented-Out Code Cleanup
File: isaaclab_arena/variations/object_color.py (lines ~255-290)
Large blocks of commented-out code (make_variation_event, make_wrapper, wrap_callable_class) should either be converted to proper documentation explaining the alternatives considered, or removed. This creates noise for future maintainers.
3. Type Annotation Consistency
File: isaaclab_arena/variations/sampler.py
The _sample method lacks return type narrowing in the ABC:
@abstractmethod
def _sample(self, num_samples: int) -> torch.Tensor:
...Consider adding a @property for event_shape to the ABC to enforce the interface.
4. Error Message Clarity
File: isaaclab_arena/variations/variation_base.py (line ~244)
# Not type-checked at runtime; assigning an incompatible cfg will only fail laterConsider adding a runtime type check since this is a public API boundary.
5. Missing ledger.py in Diff
The code references isaaclab_arena/variations/ledger.py and imports VariationLedger, but I don't see this file in the diff. Ensure it's included.
6. Device Handling
File: isaaclab_arena/variations/sampler.py
torch.rand(shape) creates tensors on CPU by default. Consider adding device parameter support:
def _sample(self, num_samples: int, device: str | torch.device = "cpu") -> torch.Tensor:This will matter when running on GPU.
📝 Minor Nits
- Inconsistent rotation format: In
compile_env_notebook.py, one pose usesrotation_wxyzand anotherrotation_xyzw— verify this is intentional - Import organization: Some files have late imports inside functions (marked with
# noqa: PLC0415) — consider consolidating where possible - Magic strings: Event names like
"{self.asset_name}_color_variation"could be centralized
🔒 Safety Considerations
- ✅ Scene replication check is enforced (
replicate_physics=False) - ✅ Hydra validation catches unknown override paths
- ✅
split_hydra_overridesproperly rejects non-Hydra tokens
📊 Test Recommendations
Before promoting from DRAFT:
- Add integration tests verifying per-env color divergence in simulation
- Add tests for the variation ledger recording flow
- Consider property-based tests for sampler bounds validation
Overall, this is a well-designed foundation for the sensitivity analysis feature. The separation between declarative config and runtime state is particularly clean. Addressing the debug print and commented code will be important for merge readiness.
Update (commit 34b4817):
Changes since previous review:
- ✅ Suggestion #5 (Missing ledger.py): Fixed!
isaaclab_arena/variations/ledger.pyis now included in the diff. - 📝 Comprehensive docstring cleanup: Docstrings across the codebase have been trimmed to be more concise, aligning with the new "Prefer short docstrings" coding style documented in AGENTS.md.
- 📝 Error handling style change: Several
raise KeyError/ValueError/TypeErrorpatterns replaced withassertstatements, per the new coding style guidance. - 🗑️ Removed design docs: Planning docs (
2026_04_21_*.md,2026_05_11_hydra_variation_plan.md) and example files (hydra_cli_example.py,hydra_dynamic_schema_example.py,tint_events.py) have been cleaned up.
Remaining items from original review:
- Suggestions #1-4 and #6 (debug print, commented-out code, type annotations, runtime type check, device handling) were not addressed in this commit — these remain as pre-merge considerations.
Update (commit 16cb604):
Good refactoring commit — extracts Hydra-related logic into a dedicated module.
Changes:
- 🆕 New module
variations/variations_hydra.py: Extractedbuild_schema(),load_cfg_from_flags(), andapply_overrides()fromArenaEnvBuilder - 🏗️
ArenaEnvBuildersimplified: Removed_iter_scene_variations,_populate_variation_ledger,_build_variations_schema,_asset_class_name; now uses thin wrappers around the new module - 🔄 Method renamed:
compose_variations_cfg→load_variations_cfg_from_flags - 📦
Scene.get_asset_variations(): New method returns(asset_name, variation)pairs, consolidating asset iteration - 📦
VariationLedger.attach_from_scene(): New convenience method for attaching variations from a scene - 🧹 Cleanup: Removed unused
tint_eventsimport fromcompile_env_notebook.py
✅ No new issues introduced. Clean extraction with proper separation of concerns. The new module has comprehensive docstrings.
Remaining items from original review (unchanged):
Update (commit 1dd82d5):
Naming refactor commit — simplifies the variation system architecture.
Changes:
- 🔄 Renamed
VariationLedger→VariationRecorder: More descriptive name;ledger.py→variation_recorder.py - 🗑️ Removed
variation_registry.py: The global registry pattern has been removed entirely - 🗑️ Removed
@register_variationdecorator: No longer used onObjectColorVariation - 📦 Updated
__init__.pyexports: Reflects the removed registry and renamed recorder
✅ No new issues introduced. This is a clean naming/cleanup refactor that simplifies the architecture by removing the unused registry pattern.
Remaining items from original review (unchanged):
Update (commit 68abea0):
API polish commit — improves VariationRecorder usability and initialization flow.
Changes:
- 🆕
summary()and__str__()methods: Added to bothVariationRecordandVariationRecorderfor human-readable output - 🔄 Renamed
attach_from_scene()→attach_to_scene(): More natural naming - 🏗️ Recorder initialization moved earlier: Now created and attached before
IsaacLabArenaManagerBasedRLEnvCfgconstruction, passed as constructor arg rather than post-hoc assignment - 📝 Notebook simplified: Verbose loop replaced with
print(env.unwrapped.cfg.variation_recorder)— leverages the new__str__methods - 🧹 Docstrings trimmed: Removed "Thin wrapper around..." boilerplate comments
✅ No new issues introduced. Clean improvements to API ergonomics.
Remaining items from original review (unchanged):
Update (commit fe45e7e):
Good refactoring commit — improves naming consistency and code organization.
Changes:
- 🔄 Renamed
Sampler→SamplerBase: More explicit naming clarifies this is the abstract base class - 🔄 Renamed
SamplerCfg→SamplerBaseCfg: Consistent with the class rename - 📂 File split:
sampler.pyextracted intosampler_base.py(ABC) +uniform_sampler.py(concrete implementation) — cleaner separation - 🔄 Renamed
variation_recorder.py→variations_recorder.py: Plural form for consistency withvariations_hydra.py - 🔄 Renamed config attr
variation_recorder→variations_recorder - 🔄 API change:
Scene.get_asset_variations()now returnsdict[str, list[VariationBase]]instead oflist[tuple[str, VariationBase]]— more natural grouping by asset - 🗑️ Removed
Scene.get_variations(): Superseded by the dict-based API
From original review:
- ✅ Suggestion #1 (Debug print): Fixed! The
print("random_colors", ...)line has been removed - ✅ Suggestion #2 (Commented-out code): Fixed! The large blocks of commented-out code in
object_color.pyhave been cleaned up - 📝 Suggestion #3 (Type annotations):
event_shapeproperty still not in the ABC — remains optional improvement - 📝 Suggestion #6 (Device handling): Still uses CPU-only
torch.rand()— remains optional improvement
✅ No new issues introduced. This is a well-organized cleanup that improves code structure and naming consistency.
Update (commit 0ff7de9):
Major feature commit — introduces build-time variations and several new variation types.
New Architecture:
- 🏗️ Two-flavor variation system:
VariationBasenow has two concrete flavors:RunTimeVariationBase: Event-driven variations (e.g. color, camera decalibration) viabuild_event_cfg()BuildTimeVariationBase: Pre-build sampling (e.g. HDR selection) viaapply()
- 🆕
CategoricalSampler: New sampler for discrete choices; returns actual items (not indices) for meaningful recorder output - 🆕
HDRImageVariation: Build-time variation that samples an HDR environment map for aDomeLight - 🆕
CameraDecalibrationVariation: Run-time variation adding calibration drift to cameras - 🧪 Comprehensive tests:
test_build_time_variations.pywith unit and simulation-app tests
API Changes:
build_event_cfg()no longer takes asceneparameter (cleaner interface)SamplerBase.sample()return type loosened toAny(supports list from categorical sampler)VariationRecordernow handles both tensor and non-tensor samplesArenaEnvBuildergains_apply_build_time_variations()method
Initialization Order:
The recorder is now attached before build-time variations are applied, ensuring HDR samples are captured correctly.
- Notebook debug settings uncommented: In
compile_env_notebook.py,headless=Falseandvisualizer="kit"are now enabled — should be reverted (headless=Trueor commented) for CI compatibility.
From original review (unchanged):
- 📝 Suggestion #3 (Type annotations):
event_shapeproperty still not in ABC - 📝 Suggestion #6 (Device handling): Still uses CPU-only
torch.rand()
Update (commit b3e7086):
Architecture refinement — elevates variation support to the Asset base class.
Key Changes:
- 🏗️ Variation methods moved to
Asset:add_variation(),get_variation(),get_variations()lifted fromObjectBasetoAssetbase class. Embodiments now host variations natively since they inherit fromAsset. - 🆕
ArenaEnvBuilder.get_all_variations(): New method aggregates variations from both scene assets and the embodiment into a unified dict. - 🔄
VariationRecorder.attach()refactored: Now accepts adict[str, list[VariationBase]]directly;attach_to_scene()is now a thin wrapper. - 🧪 Notebook expanded: Demonstrates embodiment-level camera decalibration variation on
droid_differential_ik.
Code Quality:
⚠️ Commented-out code:droid.pyhasrandomize_franka_joint_state = Nonewith the originalEventTermblock commented out. Consider removing the commented block for cleaner code.- 📝 Previous inline comment still applies:
headless=Falseincompile_env_notebook.pyshould beTruefor CI.
✅ No new blocking issues. This is a clean refactor enabling embodiment-level variations.
Remaining items from original review (unchanged):
| simulation_app = AppLauncher() | ||
| # headless=False enables the Kit viewer window so we can visually verify per-env | ||
| # randomizations (e.g. object tints). Set headless=True for CI / non-GUI runs. | ||
| args_cli = get_isaaclab_arena_cli_parser().parse_args([]) |
There was a problem hiding this comment.
🟡 Note (0ff7de9): These debug settings (headless=False, visualizer="kit") are now uncommented. For CI compatibility, consider reverting to headless=True or commenting out before merge.
Summary
Proof of concept of reset-time variations.
Detailed description
Variations are a structured way introducing differences between simulation runs.