Skip to content

Refactor Relation Solver#681

Closed
zhx06 wants to merge 7 commits into
mainfrom
zxiao/refactor_relation_solver
Closed

Refactor Relation Solver#681
zhx06 wants to merge 7 commits into
mainfrom
zxiao/refactor_relation_solver

Conversation

@zhx06
Copy link
Copy Markdown
Collaborator

@zhx06 zhx06 commented May 14, 2026

Summary

Refactor Relation Solver

Detailed description

  • Extract relation placement from ArenaEnvBuilder.
  • Add reusable object and robot solver abstractions.
  • Preserve current behavior while enabling future extensions.

Copy link
Copy Markdown
Contributor

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR cleanly extracts relation placement logic from ArenaEnvBuilder into dedicated solver abstractions. The architecture enables future robot/IK-aware validation while preserving existing behavior.

✅ CI Issues Fixed

Pre-commit was failing with two issues:

  1. Black formattingrelation_placement.py needs reformatting
  2. Flake8 R501 — Lines 115 and 162 have explicit return None

✅ Fixed in cb6d966 — The unnecessary return None statements have been removed and pre-commit should now pass.

✅ Strengths

  1. Good separation of concerns — The three-tier solver hierarchy (ObjectRelationSolverRobotRelationSolverArenaRelationSolver) provides clear extension points for future IK validation.

  2. Explicit error handling — Missing positions in layouts now raise RuntimeError instead of silently failing (lines 380-381, 395-396).

  3. Excellent test coverage — 477 lines of comprehensive tests covering normal flows, edge cases, and error conditions. The fake pooled object placers and mock solvers are well-designed.

  4. Correct __deepcopy__ handlingValidatedPlacementPool preserves object identity while copying solver state, which is essential for EventTermCfg deep-copy semantics.

  5. Clean API surface — The prepare_relation_placement() entry point provides a simple interface while allowing power users to inject custom solvers.

💭 Questions / Minor Items

  1. One-time warning visibility (line 163-165):
    The _warned_base_validation_skip flag ensures the "skipping IK validation" warning prints only once per RobotRelationSolver instance. In long-running sessions with many resets, this could be logged early and missed. Consider whether logging.warning with rate-limiting would be more appropriate for production visibility.

  2. Embodiment reconciliation flow (lines 287-321):
    When neither embodiment nor robot_solver is supplied at construction, but embodiment is provided later via prepare(), the internal robot_solver gets updated correctly. However, the flow through _reconcile_robot_solver_embodiment is subtle—a brief docstring note about the intended lifecycle would help future maintainers.

  3. Suggested test addition: Consider adding a test that calls RobotRelationSolver.validate_layout twice with embodiment to verify the warning is suppressed on subsequent calls. This would exercise the _warned_base_validation_skip flag.

Verdict

Clean refactor with solid test coverage. The architecture is extensible for the planned IK validation. Ready to merge


Update (808c1bf): ✅ Good fix — placement_events.py now uses obj.name as dictionary keys instead of object instances, which correctly handles deep-copied objects from EventTermCfg. The new test test_solve_and_place_objects_handles_copied_event_objects verifies this edge case. LGTM.


Update (a64cf76): Major simplification — the RobotRelationSolver class has been removed entirely, streamlining the architecture to focus on object-only relation solving. This resolves all three items from my "Questions / Minor Items" section (they're now N/A). The simplified PlacementSolution dataclass and ArenaRelationSolver are cleaner and easier to maintain. Tests updated accordingly. No new issues. Still LGTM


Update (a5cd187): Import reorganization — moved ManagerBasedEnv and EventTermCfg imports into TYPE_CHECKING blocks with lazy runtime import where needed. This is a good practice to avoid potential circular import issues. No functional changes. Previous inline comments about validation scope and check_objects_valid remain open for author consideration.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 14, 2026

Greptile Summary

This PR extracts relation-placement logic from ArenaEnvBuilder into a dedicated relation_placement.py module, introducing ObjectRelationSolver, ValidatedPlacementPool, and ArenaRelationSolver as composable abstractions. It also fixes a latent deepcopy-identity bug in solve_and_place_objects by switching anchor detection from object identity to name comparison.

  • New relation_placement.py: Three-layer class hierarchy (ObjectRelationSolverValidatedPlacementPoolArenaRelationSolver) with a prepare_relation_placement facade; silent None/break skips in the old static-pose path are replaced with explicit RuntimeErrors, and missing-position cases now fail loudly.
  • placement_events.py deepcopy fix: Anchor detection and rotation-lookup are now name-keyed instead of identity-keyed, so solve_and_place_objects works correctly after EventTermCfg deep-copies the objects list.
  • CI secret hardening: Container registry credential moved from env.NGC_API_KEY to secrets.ARENA_NGC_API_KEY so the value is masked in workflow logs.

Confidence Score: 4/5

Safe to merge for greenfield callers; existing callers using resolve_on_reset=False with non-anchor objects that have event_cfg set will now hit a RuntimeError that was previously silent.

The refactor is well-structured and the new test suite is thorough. The main concern is that _validate_no_explicit_non_anchor_pose_events is now called unconditionally in prepare(), where the old code only invoked it under the resolve_on_reset=True branch — any caller that relied on the silent pass-through for resolve_on_reset=False will break at runtime without a config change.

isaaclab_arena/relations/relation_placement.py — specifically the unconditional _validate_no_explicit_non_anchor_pose_events call in prepare() and the double-pass of embodiment through both the ArenaRelationSolver constructor and prepare().

Important Files Changed

Filename Overview
isaaclab_arena/relations/relation_placement.py New module extracting relation placement logic; introduces ObjectRelationSolver, ValidatedPlacementPool, and ArenaRelationSolver. The _validate_no_explicit_non_anchor_pose_events guard is now unconditional (previously only under resolve_on_reset=True), which is a silent breaking change for resolve_on_reset=False callers with event_cfg set on non-anchor objects.
isaaclab_arena/relations/placement_events.py Moved ManagerBasedEnv import to TYPE_CHECKING block and switched anchor detection from object identity to name-based comparison — correctly fixing a latent KeyError bug when EventTermCfg deep-copies the objects list.
isaaclab_arena/environments/arena_env_builder.py Replaced ~90 lines of inline relation-placement logic with a single call to prepare_relation_placement; wires the returned PlacementSolution into _placement_event_cfg cleanly.
isaaclab_arena/tests/test_relation_placement.py Comprehensive new test file covering the full prepare_relation_placement surface, including deepcopy safety, missing-position errors, subclass override hooks, and anchor-only scenes. Contains one duplicate assertion on lines 723–724.
isaaclab_arena/relations/pooled_object_placer.py Adds pool_size property to expose the requested pool size; used by ValidatedPlacementPool and tests.
isaaclab_arena/tests/test_placement_events.py Adds test verifying solve_and_place_objects works correctly when EventTermCfg deep-copies the objects list, validating the name-based anchor detection fix.
.github/workflows/ci.yml Switches container registry credential from env.NGC_API_KEY to secrets.ARENA_NGC_API_KEY — a security improvement (secrets are masked in logs).

Class Diagram

%%{init: {'theme': 'neutral'}}%%
classDiagram
    class ObjectRelationSolver {
        +num_envs: int
        +placement_seed: int|None
        +resolve_on_reset: bool|None
        +solve(objects) ObjectRelationSolveResult
        +validate_object_placement_pool(pool) None
        +validate_layout(layout) None
        +check_objects_valid(layout) bool
    }

    class ValidatedPlacementPool {
        -_placement_pool: PooledObjectPlacer
        -_object_solver: ObjectRelationSolver
        +sample_without_replacement(count) list
        +sample_with_replacement(count) list
        +remaining: int
        +pool_size: int
        +__deepcopy__(memo) ValidatedPlacementPool
    }

    class ArenaRelationSolver {
        +num_envs: int
        +embodiment: EmbodimentBase|None
        +object_solver: ObjectRelationSolver
        +objects: list[ObjectBase]
        +prepare(objects, embodiment) PlacementSolution|None
        +solve_relations() ObjectRelationSolveResult
        +validate_layout(layout) None
    }

    class PlacementSolution {
        +placement_event_cfg: EventTermCfg|None
    }

    class ArenaEnvBuilder {
        +_solve_relations()
    }

    ArenaRelationSolver --> ObjectRelationSolver : owns
    ArenaRelationSolver --> ValidatedPlacementPool : creates
    ArenaRelationSolver --> PlacementSolution : returns
    ValidatedPlacementPool --> ObjectRelationSolver : delegates validate_layout
    ArenaEnvBuilder ..> ArenaRelationSolver : via prepare_relation_placement()
Loading

Reviews (6): Last reviewed commit: "fix simapp loading" | Re-trigger Greptile


relation_result = self.solve_relations()
anchor_objects_set = set(get_anchor_objects(self.objects))
_validate_no_explicit_non_anchor_pose_events(self.objects, anchor_objects_set)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Validation scope change breaks resolve_on_reset=False callers

_validate_no_explicit_non_anchor_pose_events is now called unconditionally inside prepare(), but in the old code it was only called inside the if placer_params.resolve_on_reset: branch. Any existing caller using resolve_on_reset=False with a non-anchor object that has an event_cfg will now receive a RuntimeError where the old code silently allowed it. The PR states "Preserve current behavior," but this is a breaking behavioral change that contradicts that goal. Is expanding the validation scope to cover resolve_on_reset=False intentional? If so, the PR description should note this as an explicit behavior tightening rather than pure refactoring.

Comment on lines +420 to +426
placement_solver = solver or ArenaRelationSolver(
num_envs=num_envs,
placement_seed=placement_seed,
resolve_on_reset=resolve_on_reset,
embodiment=embodiment,
)
return placement_solver.prepare(objects, embodiment=embodiment)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 prepare_relation_placement passes embodiment to the ArenaRelationSolver constructor and then passes the same value again via prepare(embodiment=embodiment). The second call is redundant when solver is None: the constructor already wires the embodiment into self.robot_solver and self.embodiment, so _set_embodiment in prepare() is a no-op for an internally-created solver. Consider passing embodiment=None to prepare() in that path (or documenting why the double-pass is intentional).

Suggested change
placement_solver = solver or ArenaRelationSolver(
num_envs=num_envs,
placement_seed=placement_seed,
resolve_on_reset=resolve_on_reset,
embodiment=embodiment,
)
return placement_solver.prepare(objects, embodiment=embodiment)
if solver is None:
placement_solver = ArenaRelationSolver(
num_envs=num_envs,
placement_seed=placement_seed,
resolve_on_reset=resolve_on_reset,
embodiment=embodiment,
)
return placement_solver.prepare(objects)
return placement_solver.prepare(objects, embodiment=embodiment)

Comment on lines +311 to +316
def _reconcile_robot_solver_embodiment(self) -> None:
if not self._robot_solver_supplied and self.robot_solver.embodiment is None:
self.robot_solver.embodiment = self.embodiment
return
if self.robot_solver.embodiment is not self.embodiment:
raise ValueError("embodiment must match robot_solver.embodiment when both are provided.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 _reconcile_robot_solver_embodiment raises a confusing ValueError when an ArenaRelationSolver that was created without an explicit robot_solver is reused across multiple prepare() calls with different non-None embodiments. The first prepare(embodiment=emb_a) call passes the robot_solver.embodiment is None guard and sets the default solver's embodiment to emb_a. A subsequent prepare(embodiment=emb_b) call finds robot_solver.embodiment is emb_a (not None), skips the first branch entirely, and then the second check robot_solver.embodiment is not self.embodiment fires, raising ValueError: "embodiment must match robot_solver.embodiment when both are provided" — a message that references a robot_solver the user never explicitly provided. Removing the is None guard makes the default solver always mirror self.embodiment.

Suggested change
def _reconcile_robot_solver_embodiment(self) -> None:
if not self._robot_solver_supplied and self.robot_solver.embodiment is None:
self.robot_solver.embodiment = self.embodiment
return
if self.robot_solver.embodiment is not self.embodiment:
raise ValueError("embodiment must match robot_solver.embodiment when both are provided.")
def _reconcile_robot_solver_embodiment(self) -> None:
if not self._robot_solver_supplied:
self.robot_solver.embodiment = self.embodiment
return
if self.robot_solver.embodiment is not self.embodiment:
raise ValueError("embodiment must match robot_solver.embodiment when both are provided.")

return layout.success


class RobotRelationSolver:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's take out this class from this MR. This MR is meant to be refactoring only. We can start the class design in a doc with high level class diagram first.

Comment on lines +95 to +99
def validate_layout(self, layout: PlacementResult) -> None:
"""Validate one object-relation placement result."""

def check_objects_valid(self, layout: PlacementResult) -> bool:
return layout.success
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 check_objects_valid is defined but never called anywhere in the codebase. ValidatedPlacementPool._validate_layouts exclusively calls validate_layout on the solver. A subclass author who overrides check_objects_valid expecting it to run during sampling will get no error and no effect — a silent no-op that is hard to diagnose. Either wire it into validate_layout's default implementation, or remove it to keep the extension API unambiguous.

Suggested change
def validate_layout(self, layout: PlacementResult) -> None:
"""Validate one object-relation placement result."""
def check_objects_valid(self, layout: PlacementResult) -> bool:
return layout.success
def validate_layout(self, layout: PlacementResult) -> None:
"""Validate one object-relation placement result."""
if not self.check_objects_valid(layout):
raise RuntimeError("Placement layout failed object validation.")
def check_objects_valid(self, layout: PlacementResult) -> bool:
return layout.success

Comment thread .github/workflows/ci.yml
credentials:
username: $oauthtoken
password: ${{ env.NGC_API_KEY }}
password: ${{ secrets.ARENA_NGC_API_KEY }}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by accident?

@xyao-nv
Copy link
Copy Markdown
Collaborator

xyao-nv commented May 27, 2026

Close it as #707 addressed the same request

@xyao-nv xyao-nv closed this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants