Add Lightwheel fetching asset retry handling#731
Conversation
Greptile SummaryThis PR introduces a shared
Confidence Score: 5/5The retry logic and timeout restoration are correct across all code paths; the change is safe to merge. The finally block correctly restores client.base_timeout on success, on retried timeouts, and on final propagation. The loop bounds and retry-vs-raise branching are accurate. No functional defects were found in the new code paths introduced by this PR. No files require special attention beyond the open feedback from previous review threads on lightwheel_utils.py. Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant acquire_lightwheel_asset
participant client
participant acquire_fn
Caller->>acquire_lightwheel_asset: "call(loader, acquire_fn, attempts=3, timeout_sec=60, ...)"
acquire_lightwheel_asset->>client: read base_timeout to old_timeout
loop for attempt in 1..attempts
acquire_lightwheel_asset->>client: "set base_timeout = timeout_sec"
acquire_lightwheel_asset->>acquire_fn: "acquire_fn(**kwargs)"
alt success
acquire_fn-->>acquire_lightwheel_asset: result
acquire_lightwheel_asset->>client: "restore base_timeout = old_timeout"
acquire_lightwheel_asset-->>Caller: return result
else TimeoutError not last attempt
acquire_fn-->>acquire_lightwheel_asset: raise TimeoutError
Note over acquire_lightwheel_asset: _looks_like_timeout True
acquire_lightwheel_asset->>client: "restore base_timeout = old_timeout finally"
acquire_lightwheel_asset->>acquire_lightwheel_asset: sleep(delay_sec)
else TimeoutError last attempt or non-timeout error
acquire_fn-->>acquire_lightwheel_asset: raise Exception
acquire_lightwheel_asset->>client: "restore base_timeout = old_timeout finally"
acquire_lightwheel_asset-->>Caller: re-raise
end
end
Reviews (2): Last reviewed commit: "lint" | Re-trigger Greptile |
| assert attempts > 0, "attempts must be positive" | ||
| assert delay_sec >= 0, "delay_sec must be non-negative" |
There was a problem hiding this comment.
Using
assert for runtime parameter validation is fragile — asserts are silently stripped when Python runs with -O or -OO (common in optimized/production environments). If attempts=0 is passed, the loop body never executes and execution falls through to raise AssertionError("unreachable"), which is a confusing error. Use ValueError instead so invalid arguments are always caught.
| assert attempts > 0, "attempts must be positive" | |
| assert delay_sec >= 0, "delay_sec must be non-negative" | |
| if attempts <= 0: | |
| raise ValueError("attempts must be positive") | |
| if delay_sec < 0: | |
| raise ValueError("delay_sec must be non-negative") |
| text = f"{type(current).__name__}: {current}".lower() | ||
| if "timeout" in text or "timed out" in text: | ||
| return True | ||
| current = current.__cause__ or current.__context__ |
There was a problem hiding this comment.
The
__cause__ or __context__ expression only follows one branch of the exception chain. When an exception has both __cause__ set (explicit raise X from Y) and a __context__ (implicit context), only __cause__ is traversed and the __context__ subtree is never visited. A timeout buried exclusively in the __context__ chain would be missed, causing the exception to propagate without retrying.
| current = current.__cause__ or current.__context__ | |
| next_exc = current.__cause__ if current.__cause__ is not None else current.__context__ | |
| current = next_exc |
There was a problem hiding this comment.
PR Review: Add Lightwheel fetching asset retry handling
Summary: This PR adds robust retry handling for transient Lightwheel SDK API timeouts by introducing a shared acquire_lightwheel_asset helper function. The implementation is clean and well-tested.
Strengths
- Clean abstraction: The
acquire_lightwheel_assethelper encapsulates timeout configuration and retry logic in one place, making it easy to apply consistently across all Lightwheel asset acquisitions. - Proper timeout restoration: Uses a sentinel
_MISSINGto correctly handle cases where the client may not have abase_timeoutattribute. - Exception chain walking:
_looks_like_timeoutcorrectly inspects__cause__and__context__to catch wrapped timeout exceptions. - Good test coverage: Both integration and unit tests verify the retry behavior and timeout restoration.
Suggestions
1. Consider moving finally block outside the loop (minor)
The current structure restores old_timeout after every attempt:
for attempt in range(1, attempts + 1):
try:
...
except Exception as exc:
...
finally:
if old_timeout is not _MISSING:
client.base_timeout = old_timeout # restored each iterationWhile this works because the timeout is re-set at the start of each iteration, using a try/finally around the entire loop would be clearer:
try:
for attempt in range(1, attempts + 1):
try:
if timeout_sec is not None and old_timeout is not _MISSING:
client.base_timeout = timeout_sec
return acquire_fn(**kwargs)
except Exception as exc:
if not _looks_like_timeout(exc) or attempt == attempts:
raise
print(f"[isaaclab-arena] {description} timed out; retrying {attempt + 1}/{attempts}...")
time.sleep(delay_sec)
finally:
if old_timeout is not _MISSING:
client.base_timeout = old_timeout2. Log message consistency (nitpick)
The log prefix [isaaclab-arena] is clear. Consider using Python logging instead of print for consistency with other Isaac Lab modules, but this is optional.
Overall
Clean implementation that properly handles a real operational concern. The code is defensive, well-structured, and includes appropriate tests.
Verdict: Looks good to merge ✅
Summary
Add Lightwheel fetching asset retry handling
Detailed description
acquire_lightwheel_assethelper with scoped timeout override and retry handling.