Skip to content

Refactor: Metrics in IsaacLabArenaManagerBasedRLEnvCfg#733

Open
alexmillane wants to merge 5 commits into
mainfrom
alex/refactor/metrics_cfg
Open

Refactor: Metrics in IsaacLabArenaManagerBasedRLEnvCfg#733
alexmillane wants to merge 5 commits into
mainfrom
alex/refactor/metrics_cfg

Conversation

@alexmillane
Copy link
Copy Markdown
Collaborator

@alexmillane alexmillane commented May 28, 2026

Summary

Refactors how metrics enter our IsaacLabArenaManagerBasedRLEnvCfg to match Isaac Lab convention.

Detailed description

  • We were attaching metrics runtime objects to the IsaacLabArenaManagerBasedRLEnvCfg
  • Isaac Lab expects only configclasses in *Cfg objects.
  • This changes our design to match more closely with Isaac Lab.
    • Metrics now enter the env through MetricsCfgs
    • Run-time metrics are then built at env build time by IsaacLabArenaManagerBasedRLEnv
    • This matches how things are done in IsaacLab - Cfgs are spcified by the user, and then run-time objects are built by the system based on those configs.

Copy link
Copy Markdown
Contributor

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Metrics Refactor to Isaac Lab Conventions

Summary

This PR refactors the metrics system to align with Isaac Lab conventions, moving from runtime objects attached to the config to a configclass-based approach with a dedicated MetricsManager. This is a well-structured architectural improvement.


✅ What Works Well

  1. Consistent Isaac Lab Pattern: The new MetricsManager mirrors how RewardManager/TerminationManager work - parsing a configclass container and iterating fields. This improves maintainability.

  2. Clean Separation: Extracting module-level compute_* functions from the metric classes enables better testability and follows functional programming principles.

  3. Proper Type Annotations: MetricTermCfg is well-documented with clear docstrings explaining the expected function signature.

  4. Test Updates: Tests are updated to use the new env.unwrapped.compute_metrics() API consistently.


⚠️ Issues & Suggestions

1. Incomplete compute_metrics Method (Medium Priority)

File: isaaclab_arena/environments/isaaclab_arena_manager_based_env.py (L113-119)

The compute_metrics() method docstring says it returns metrics, but the method body appears truncated in the diff. Ensure the implementation calls self.metrics_manager.compute():

def compute_metrics(self) -> dict[str, Any]:
    """Compute all registered metrics from the recorded HDF5 dataset."""
    return self.metrics_manager.compute()

2. Mutable Default Argument (Low Priority)

File: isaaclab_arena/metrics/metric_term_cfg.py (L33)

params: dict[str, Any] = dict()

Using dict() as a default is fine in dataclasses with @configclass, but double-check that Isaac Lab's configclass decorator handles this correctly to avoid shared mutable state.

3. Debug Print Statement (Low Priority)

File: isaaclab_arena/environments/isaaclab_arena_manager_based_env.py (L106)

print("[INFO] Metrics Manager: ", self.metrics_manager)

Consider using a proper logger or removing this before merge. Debug prints can clutter production logs.

4. Type Annotation for metrics Field (Low Priority)

File: isaaclab_arena/environments/isaaclab_arena_manager_based_env.py (L70)

metrics: object | None = None

Consider a more specific type hint. Even a type alias like MetricsCfg = object with documentation would improve clarity. Alternatively, define a protocol or use Any explicitly.

5. Missing __repr__ for MetricsManager (Low Priority)

The print statement in load_managers will show <MetricsManager object at 0x...>. Consider adding a __repr__ method to show active terms:

def __repr__(self) -> str:
    return f"MetricsManager(terms={self.active_terms})"

🔍 Edge Cases to Consider

  1. Empty Metrics Container: If cfg.metrics is an empty configclass (no fields), _prepare_terms will correctly produce an empty manager - this is handled well.

  2. None Term Values: The check if term_cfg is None: continue handles optional metrics gracefully.

  3. Dataset Not Found: If get_metric_recorder_dataset_path() returns a path that doesn't exist, get_recorded_metric_data() will fail. This is pre-existing behavior but worth noting.


📊 Test Coverage

  • test_revolute_joint_moved_rate_metric.py updated
  • test_success_rate_metric.py updated
  • ⚠️ No direct unit tests for MetricsManager class itself
  • ⚠️ No test for _metrics_to_metrics_cfg in ArenaEnvBuilder

Consider adding:

  1. Unit test for MetricsManager with mock configclass
  2. Test for edge case where metrics=None (NoTask scenario)

🔧 Unrelated Change

File: isaaclab_arena/examples/compile_env_notebook.py (L34)

-cracker_box.set_initial_pose(Pose(position_xyz=(0.4, 0.0, 0.1), rotation_wxyz=(1.0, 0.0, 0.0, 0.0)))
+cracker_box.set_initial_pose(Pose(position_xyz=(0.4, 0.0, 0.1), rotation_xyzw=(0.0, 0.0, 0.0, 1.0)))

This changes rotation_wxyz to rotation_xyzw - a quaternion convention change. Is this intentional as part of this PR, or should it be a separate commit? Both represent identity rotation, but mixing concerns in one PR can make git bisect harder.


Verdict

Approve with minor suggestions. The architectural change is sound and aligns well with Isaac Lab conventions. The minor issues noted above are non-blocking but should be addressed before or shortly after merge.

Great work on improving the codebase consistency! 🎉


Update (5a89f0c): Reviewed incremental changes from 1d98931 to 5a89f0c.

Changes Reviewed

  1. Debug print removed ✅ - The print("[INFO] Metrics Manager: ...") statement noted in my original review has been removed.

  2. Module split - Configs (IsaacLabArenaManagerBasedRLEnvCfg, IsaacArenaManagerBasedMimicEnvCfg, ArenaPhysicsCfg) moved to new file isaaclab_arena_manager_based_env_cfg.py. This is good separation of concerns - runtime env stays lean.

  3. API rename: funccompute_metric_func - More descriptive naming in MetricTermCfg. All metric classes updated accordingly (success_rate.py, object_moved.py, revolute_joint_moved_rate.py, sequential_task_base.py).

  4. Removed compute_metric_from_recording abstract method - The now-redundant abstract method in MetricBase and its implementations have been removed. The computation is handled via compute_metric_func in the term config.

  5. Simplified MetricsManager._prepare_terms - Removed dict handling branch (configclass only) and explicit type check. Cleaner code, but ⚠️ minor consideration: if a non-MetricTermCfg field exists on the configclass, it will silently fail when accessed. The original type check provided a clearer error message.

  6. Removed unused import - import numpy as np removed from metric_base.py.

  7. Docstrings condensed - Various docstrings trimmed for brevity.

Minor Note

The removal of the explicit MetricTermCfg type check in _prepare_terms means invalid config types will now fail later (e.g., AttributeError: 'str' object has no attribute 'recorder_term_name') rather than with a descriptive TypeError. This is a minor ergonomic tradeoff for cleaner code - acceptable given that configs are typically generated programmatically by _metrics_to_metrics_cfg.

Verdict

All changes look good. The refactoring addressed the debug print issue and further improves the codebase structure. Approve stands.


Update (c723fe0): Reviewed new file isaaclab_arena_manager_based_env_cfg.py. This is the config file mentioned in the previous update - now properly committed. Contains ArenaPhysicsCfg (PhysX/Newton presets), IsaacLabArenaManagerBasedRLEnvCfg, and IsaacArenaManagerBasedMimicEnvCfg. Clean implementation with proper docstrings and appropriate defaults. No issues found. ✅

Copy link
Copy Markdown
Collaborator Author

@alexmillane alexmillane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self review.

Comment on lines +85 to +90
"""Build a configclass container with one ``MetricTermCfg`` field per metric.

Mirrors how Isaac Lab shapes ``cfg.rewards`` / ``cfg.terminations`` so that
the runtime :class:`MetricsManager` can iterate the container's fields the
same way ``RewardManager`` / ``TerminationManager`` do.
"""
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shorten to single line comment.

Comment on lines +70 to +71
# Metrics: a configclass container with one ``MetricTermCfg`` field per metric (or ``None``).
# Mirrors how ``rewards`` / ``terminations`` are shaped on ``ManagerBasedRLEnvCfg``.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comment.

Comment on lines +97 to +103
"""Arena run-time env that adds an offline :class:`MetricsManager` to the base class.

Mirrors how Isaac Lab attaches a :class:`isaaclab.managers.RewardManager` /
:class:`isaaclab.managers.TerminationManager` in :meth:`load_managers`. The metrics
manager itself is lightweight and only runs after rollout (see
:class:`isaaclab_arena.metrics.metrics_manager.MetricsManager`).
"""
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorten comment.

Should say "Arena extension to ManagerBasedRLEnv that adds metrics.

print("[INFO] Metrics Manager: ", self.metrics_manager)

def compute_metrics(self) -> dict[str, Any]:
"""Compute all registered metrics from the recorded HDF5 dataset.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "from the recorded HDF5 dataset"

def load_managers(self) -> None:
super().load_managers()
self.metrics_manager = MetricsManager(self.cfg.metrics, self)
print("[INFO] Metrics Manager: ", self.metrics_manager)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Comment on lines +56 to +60
if not isinstance(term_cfg, MetricTermCfg):
raise TypeError(
f"Configuration for the metric term '{term_name}' is not of type MetricTermCfg."
f" Received: '{type(term_cfg)}'."
)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know this is the case, so we can leave this check out.

"""Names of the metric terms registered on this manager."""
return list(self._term_names)

def compute(self, dataset_path: pathlib.Path | None = None) -> dict[str, Any]:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the argument for dataset_path: pathlib.Path | None = None and only rely on the recorder manager.

Comment thread isaaclab_arena/metrics/object_moved.py Outdated
object_moved_rate = np.mean(object_moved_per_demo)
return object_moved_rate
def compute_metric_from_recording(self, recorded_metric_data: list[np.ndarray]) -> float:
return compute_object_moved_rate(recorded_metric_data, self.object_velocity_threshold)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment in the base class, let's get rid of this function: compute_object_moved_rate

revolute_joint_moved_per_demo.append(moved_open or moved_closed)
revolute_joint_moved_rate = np.mean(revolute_joint_moved_per_demo)
return revolute_joint_moved_rate
def compute_metric_from_recording(self, recorded_metric_data: list[np.ndarray]) -> float:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment in the base class, let's get rid of this function: compute_metric_from_recording

)

return subtask_success_rates.tolist()
def compute_metric_from_recording(self, recorded_metric_data: list[np.ndarray]) -> list:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment in base class, we can get rid of: compute_metric_from_recording

Copy link
Copy Markdown
Collaborator Author

@alexmillane alexmillane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More

Comment on lines +26 to +29
The returned config carries a pointer to the module-level offline compute
function (``compute_metric_func``) together with the parameters needed to call
it. Used by the runtime ``MetricsManager`` to compute the metric value once a
rollout has finished.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to:

"The returned config carries a pointer to the metric computation
function together with the parameters needed to call
it. Used by the runtime MetricsManager to compute the metric value once a
rollout has finished."

Comment on lines +20 to +23
Mirrors the Isaac Lab manager pattern (:class:`isaaclab.managers.RewardManager`,
:class:`isaaclab.managers.TerminationManager`, ...) but is intentionally lightweight:
metric computation runs *after* a rollout has finished, so there is no need for
per-step hooks or the full :class:`isaaclab.managers.ManagerBase` machinery.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove:

"Mirrors the Isaac Lab manager pattern (:class:isaaclab.managers.RewardManager,
:class:isaaclab.managers.TerminationManager, ...) but is intentionally lightweight:
metric computation runs after a rollout has finished, so there is no need for
per-step hooks or the full :class:isaaclab.managers.ManagerBase machinery."

@alexmillane alexmillane marked this pull request as ready for review May 28, 2026 14:51
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 28, 2026

Greptile Summary

Refactors metrics from runtime objects stored directly on IsaacLabArenaManagerBasedRLEnvCfg to Isaac Lab–style configclass containers, introducing MetricTermCfg, MetricsManager, and IsaacLabArenaManagerBasedRLEnv as the new default entry point.

  • Cfg / runtime split: the three cfg classes (ArenaPhysicsCfg, IsaacLabArenaManagerBasedRLEnvCfg, IsaacArenaManagerBasedMimicEnvCfg) move to a dedicated _cfg.py file; the _env.py file now holds only the new IsaacLabArenaManagerBasedRLEnv subclass, which wires MetricsManager in load_managers().
  • Metric compute functions extracted: each metric (SuccessRate, ObjectMovedRate, RevoluteJointMovedRate, SubtaskSuccessRate) now has a standalone module-level compute function; get_metric_term_cfg() replaces compute_metric_from_recording() on every MetricBase subclass, returning a MetricTermCfg that bundles the function pointer + params + recorder name.
  • Entry point change: _get_env_entry_point now returns IsaacLabArenaManagerBasedRLEnv instead of the base Isaac Lab ManagerBasedRLEnv, ensuring MetricsManager is always initialized via the standard load_managers() hook.

Confidence Score: 4/5

The refactoring is structurally sound and follows Isaac Lab conventions; all call sites are updated consistently and the new MetricsManager initializes cleanly when no metrics are configured.

The _prepare_terms method iterates over cfg.dict while the rest of the codebase uses dataclasses.fields() for configclass traversal — a pattern mismatch that would produce a confusing AttributeError at metric-compute time if any non-declared instance attribute ever appears. The policy_runner metric check also guards on the cfg but not on the presence of compute_metrics, leaving a narrow gap if an env without the method carries a metrics cfg. Neither issue affects the happy path for the current codebase, but both are worth addressing before the code drifts further.

isaaclab_arena/metrics/metrics_manager.py (_prepare_terms field iteration) and isaaclab_arena/evaluation/policy_runner.py (missing compute_metrics guard)

Important Files Changed

Filename Overview
isaaclab_arena/environments/isaaclab_arena_manager_based_env.py Slimmed down to only the new IsaacLabArenaManagerBasedRLEnv runtime class; CFG classes moved to _cfg.py. load_managers() hooks MetricsManager, compute_metrics() delegates to it.
isaaclab_arena/environments/isaaclab_arena_manager_based_env_cfg.py New home for ArenaPhysicsCfg, IsaacLabArenaManagerBasedRLEnvCfg, and IsaacArenaManagerBasedMimicEnvCfg; metrics field typed as object
isaaclab_arena/metrics/metrics_manager.py New MetricsManager; iterates fields via cfg.dict rather than dataclasses.fields(), which is inconsistent with the rest of the codebase and fragile if configclass ever adds non-declared instance attributes.
isaaclab_arena/metrics/metric_term_cfg.py New MetricTermCfg configclass; params uses a bare dict() mutable default rather than dataclasses.field(default_factory=dict), relying on @configclass to wrap it.
isaaclab_arena/environments/arena_env_builder.py Adds _metrics_to_metrics_cfg helper to convert MetricBase list to a dynamic MetricsCfg configclass; default entry point changed from ManagerBasedRLEnv to IsaacLabArenaManagerBasedRLEnv.
isaaclab_arena/evaluation/policy_runner.py Replaces lazy-imported compute_metrics() free function with env.unwrapped.compute_metrics() method call; check on cfg.metrics presence is retained but no guard for missing compute_metrics method.

Sequence Diagram

sequenceDiagram
    participant Builder as ArenaEnvBuilder
    participant Env as IsaacLabArenaManagerBasedRLEnv
    participant MM as MetricsManager
    participant HDF5 as HDF5 Dataset

    Builder->>Builder: _metrics_to_metrics_cfg(metrics)
    Builder->>Env: "cfg.metrics = MetricsCfg instance"
    Env->>Env: load_managers()
    Env->>MM: MetricsManager(cfg.metrics, self)
    MM->>MM: _prepare_terms(cfg)

    Note over Env,HDF5: At eval / rollout end
    Env->>MM: compute()
    MM->>HDF5: get_metric_recorder_dataset_path
    loop for each MetricTermCfg
        MM->>HDF5: get_recorded_metric_data(recorder_term_name)
        HDF5-->>MM: list[np.ndarray]
        MM->>MM: "compute_metric_func(data, **params)"
    end
    MM->>HDF5: get_num_episodes()
    MM-->>Env: dict[str, Any]
Loading

Reviews (1): Last reviewed commit: "Moar cleanup." | Re-trigger Greptile

Comment on lines +39 to +44
def _prepare_terms(self, cfg: object | None) -> None:
if cfg is None:
return
for term_name, term_cfg in cfg.__dict__.items():
self._term_names.append(term_name)
self._term_cfgs.append(term_cfg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The rest of the codebase (e.g. combine_configclass_instances, transform_configclass_instance) iterates configclass fields using dataclasses.fields(), not instance.__dict__. Using __dict__ is fragile: any non-declared instance attribute added by a configclass __post_init__ hook or future subclass would be silently treated as a metric term, causing a confusing AttributeError at compute() time when recorder_term_name doesn't exist on the stray value.

Suggested change
def _prepare_terms(self, cfg: object | None) -> None:
if cfg is None:
return
for term_name, term_cfg in cfg.__dict__.items():
self._term_names.append(term_name)
self._term_cfgs.append(term_cfg)
def _prepare_terms(self, cfg: object | None) -> None:
if cfg is None:
return
import dataclasses
for field in dataclasses.fields(cfg):
self._term_names.append(field.name)
self._term_cfgs.append(getattr(cfg, field.name))

Comment on lines +29 to +30
params: dict[str, Any] = dict()
"""The keyword arguments forwarded to :attr:`compute_metric_func`. Defaults to an empty dict."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Using a bare dict() call as a field default is unconventional for dataclass-based configs. While Isaac Lab's @configclass wraps mutable defaults in field(default_factory=...) automatically, making this explicit is consistent with standard Python dataclass practice and avoids surprises if the class is ever used outside the configclass machinery.

Suggested change
params: dict[str, Any] = dict()
"""The keyword arguments forwarded to :attr:`compute_metric_func`. Defaults to an empty dict."""
params: dict[str, Any] = dataclasses.field(default_factory=dict)
"""The keyword arguments forwarded to :attr:`compute_metric_func`. Defaults to an empty dict."""

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines 104 to +105
if hasattr(env.unwrapped.cfg, "metrics") and env.unwrapped.cfg.metrics is not None:
from isaaclab_arena.metrics.metrics import compute_metrics

metrics = compute_metrics(env.unwrapped)
metrics = env.unwrapped.compute_metrics()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The guard checks that cfg.metrics is non-None, but compute_metrics() is defined only on IsaacLabArenaManagerBasedRLEnv. If the unwrapped env is a plain ManagerBasedRLEnv carrying a metrics-bearing cfg (e.g. an env registered before this refactor), the call raises AttributeError. Adding a hasattr guard mirrors the defensive intent of the existing cfg check.

Suggested change
if hasattr(env.unwrapped.cfg, "metrics") and env.unwrapped.cfg.metrics is not None:
from isaaclab_arena.metrics.metrics import compute_metrics
metrics = compute_metrics(env.unwrapped)
metrics = env.unwrapped.compute_metrics()
if (
hasattr(env.unwrapped.cfg, "metrics")
and env.unwrapped.cfg.metrics is not None
and hasattr(env.unwrapped, "compute_metrics")
):
metrics = env.unwrapped.compute_metrics()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant