Add LLMAgent to EnvGraphSpec generation by qianl-nv · Pull Request #718 · isaac-sim/IsaacLab-Arena

qianl-nv · 2026-05-26T14:40:43Z

Summary

First part of the LLM env gen pipeline: LLMAgent going from prompt to LLMEnvSchema (JSON)

Detailed description

Reason: LLM infers task, robot & object names from prompt
Changed: Add new LLMAgent class & output shema (defines subset of supported task/constraints ect).
Impact: Supports the first step in the user journey for agentic env gen

To run: python -m isaaclab_arena.llm_env_gen.try_schema

Resolver matches object/robot proposal from LLM (based on name/tag) to the exact entries in the arena regestry. It outputs an ArenaEnvGraphSpec with the nodes, tasks and initial state scene graph filled in based on LLM output + resolver match.

… always called when loading from yaml/dict

isaaclab-review-bot

Review Summary

Thanks for adding the LLM-based environment generation pipeline! The modular design with clear separation between schema definition, LLM interaction, and deterministic resolution is well thought out. The trace-based debugging will be valuable for understanding resolution decisions.

Suggestions

1. Replace assertions with explicit exceptions for runtime validation

File: isaaclab_arena/llm_env_gen/llm_agent.py (lines 57-58)

The API key validation uses assert, which can be stripped with Python's -O flag:

assert self.api_key, "API key required: set NV_API_KEY or pass api_key."

Consider using an explicit exception:

if not self.api_key:
    raise ValueError("API key required: set NV_API_KEY or pass api_key.")

2. Improve JSON extraction error handling

File: isaaclab_arena/llm_env_gen/llm_agent.py (lines 148-161)

The _extract_json method uses contextlib.suppress(json.JSONDecodeError) before attempting manual parsing. If both paths fail, it raises AssertionError. Consider:

Raising a more descriptive custom exception (e.g., LLMResponseParseError)
Including the raw response in error messages for debugging

class LLMResponseParseError(ValueError):
    """Raised when the LLM response cannot be parsed as JSON."""

@staticmethod
def _extract_json(content: str) -> dict:
    content = content.strip()
    # ... extraction logic ...
    raise LLMResponseParseError(f"Could not extract JSON from LLM response: {content[:200]}...")

3. Consider adding retry logic for transient API failures

File: isaaclab_arena/llm_env_gen/llm_agent.py

LLM API calls can fail transiently due to rate limits or network issues. Consider adding retry logic with exponential backoff, or at minimum documenting that callers should implement retries:

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def generate_spec(self, ...):
    ...

4. Add test coverage for LLMAgent

File: isaaclab_arena/tests/test_resolver.py

The resolver tests are thorough with the FakeAssetRegistry pattern. Consider adding similar tests for LLMAgent:

Test _extract_json with various edge cases (markdown fences, nested JSON, malformed responses)
Test schema validation rejections
Mock the OpenAI client to test the overall flow without making real API calls

5. Document the silent failure behavior in resolver

File: isaaclab_arena/llm_env_gen/resolver.py

The design choice to never raise on LLM mistakes (recording trace events instead) is documented in the class docstring, which is good. Consider adding a helper method or property to check if resolution was "clean":

@property
def has_resolution_errors(self) -> bool:
    """Returns True if any trace events indicate failed resolution."""
    error_stages = {"item.miss", "name.miss", "task.unknown_subject", "task.unknown_target"}
    return any(e.stage in error_stages for e in self.trace)

Minor Notes

The try_schema.py script is useful for manual testing but consider adding it to .gitignore or documenting it as a development-only tool
Good use of __post_init__ in ArenaEnvGraphSpec to centralize invariant checking

Overall this is a solid foundation for the LLM environment generation feature.

Update (commit 3336f26): The PR has been substantially refactored since my initial review.

Status of Previous Suggestions

✅ Replace assertions with explicit exceptions — Fixed. The new EnvGenAgent uses explicit raise ValueError instead of assert for the API key check.
✅ Improve JSON extraction error handling — Fixed. The new AgentResponseParseError(ValueError) custom exception is exactly what I suggested, including raw response preview (capped at 500 chars for readability).
ℹ️ Retry logic — Not implemented, but this was a nice-to-have suggestion. Callers can wrap as needed.
✅ Add test coverage for LLMAgent/EnvGenAgent — Fixed. The new test_env_gen_agent.py (399 lines) comprehensively covers:
- __init__ argument/env-var precedence and missing-key guard
- _extract_json edge cases (fenced, prose, malformed, unbalanced braces, truncated error messages)
- generate_spec / ping with mocked OpenAI client
- System prompt coverage for RelationKind/TaskKind literals
⚪ Resolver silent failure documentation — N/A. The entire resolver module was removed in this refactor. The PR now focuses on the agent layer; resolution is presumably handled elsewhere.

New Changes Observed

Module renamed: llm_env_gen → agentic_env_gen
Schema renamed: SceneSpec → EnvIntentSpec (with new reasoning field for chain-of-thought)
Added CI job test_agent_remote_e2e with proper secret validation
Added pydantic>=2.0 as hard dependency in setup.py
Removed to_dict() / to_yaml() from ArenaEnvGraphSpec — validation now runs only in from_dict()
Added at_pose to RelationKind literal

No new issues found in the incremental changes. The refactor improves the codebase significantly.

Update (commit fb42596): Major refactor to OpenAI-compatible structured outputs.

Changes Observed

Structured outputs migration: The agent now uses response_format={"type": "json_schema", ...} with strict mode, eliminating the need for prose-parsing fallback. This is a significant reliability improvement — the wire guarantees a valid JSON envelope.
New structured_output_utils.py module: Well-designed extraction of reusable utilities:
- build_strict_schema() / apply_strict_constraints() — munges pydantic schemas for OpenAI/Bedrock strict mode (adds additionalProperties: false, populates required, strips default keys)
- extract_response_text() — handles NVIDIA DeepSeek's reasoning_content channel quirk
- check_structured_output_support() — deployment validator probe with rich diagnostics
Removed code: _extract_json() and AgentResponseParseError removed — no longer needed since structured outputs guarantee the JSON envelope. Clean deletion.
Constructor-time fail-fast: Two-stage validation on EnvGenAgent.__init__:
- First: cheap ping() to verify connectivity/auth
- Then: full check_structured_output_support() probe with the real schema
This surfaces capability failures at construction time rather than mid-pipeline.
Test coverage: Comprehensive — test_structured_output_utils.py (421 lines) covers:
- Schema munging for strict mode (``, nested objects, idempotency)
- Channel extraction fallback (content → reasoning_content → empty)
- Probe failure modes (4xx API error, empty envelope, parse error, validation error)
- Live endpoint test gated by @pytest.mark.agent_remote_e2e

Notes

The json.loads(text, strict=False) call in generate_spec is a good workaround for DeepSeek-v4-flash's unescaped control characters — documented via inline comment.
Good defensive caching of self._spec_schema to avoid re-walking the schema on every call.
The system prompt was correctly trimmed to remove the now-redundant "emit ONLY JSON" instruction.

No new issues found. This is a clean, well-tested migration to structured outputs.

Update (commit f92fa73): Two defensive improvements.

Allow empty tasks list in EnvIntentSpec: Removed the @model_validator that enforced non-empty tasks. Empty tasks now legitimately maps to NoTask for static sandbox environments. The field description clearly documents this behavior.
Handle empty choices array in structured output validation: Added defensive check in check_structured_output_support() for when providers (Azure content-filter, Bedrock guardrails) return HTTP 200 with an empty choices list — preventing IndexError and surfacing it cleanly as a parse_error with response_route="empty".

Both changes include test updates. No issues found.

greptile-apps · 2026-05-26T14:46:11Z

Greptile Summary

This PR introduces the first stage of the agentic environment-generation pipeline: an EnvGenAgent class that calls an OpenAI-compatible structured-outputs endpoint (NVIDIA hosted by default) and parses the response into a validated EnvIntentSpec Pydantic model.

EnvGenAgent — constructor runs two fail-fast wire checks (cheap ping then a structured-output probe with the full EnvIntentSpec schema) so misconfigured models are rejected before generate_spec is ever called; generate_spec uses response_format=json_schema exclusively, eliminating the prose-extraction fallback from the predecessor module.
structured_output_utils — schema munging, the NVIDIA DeepSeek reasoning_content channel fallback, and a deployment-validator probe that correctly guards against empty choices lists — though the sibling ping() function has the same gap and is not yet guarded.
CI & deps — a new agent_remote_e2e job gates live tests behind a secret-presence check; openai and pydantic>=2.0 are added to RUNTIME_DEPS.

Confidence Score: 4/5

Safe to merge with one small fix: ping() needs the same empty-choices guard that was already applied to check_structured_output_support in this PR.

ping() accesses resp.choices[0] unconditionally. The same empty-choices path that check_structured_output_support was explicitly hardened against in this PR can also hit ping, causing an IndexError to propagate through EnvGenAgent.__init__() as an opaque crash rather than a diagnosable wire failure. The rest of the pipeline is well-reasoned and well-tested.

isaaclab_arena/environments/agentic_env_gen/structured_output_utils.py — the ping() function at line 154.

Important Files Changed

Filename	Overview
isaaclab_arena/environments/agentic_env_gen/structured_output_utils.py	New utility module for structured-output probing; `check_structured_output_support` correctly guards against empty `choices`, but `ping()` does not and will raise `IndexError` on the same empty-choices API paths.
isaaclab_arena/environments/agentic_env_gen/env_intent_spec.py	New Pydantic schema for LLM output; `Relation.params: dict` (bare untyped dict) generates an object schema without `additionalProperties: false`, which may fail strict OpenAI-compatible validators.
isaaclab_arena/environments/agentic_env_gen/env_gen_agent.py	New EnvGenAgent with constructor-time ping + structured-output validation and `generate_spec` using `response_format=json_schema`; clean refactor from prose-parsing predecessor.
isaaclab_arena/environments/agentic_env_gen/try_schema.py	Demo/debug script; background override now correctly rewrites both `rel.subject` and `rel.target`, fixing the previously-noted unary-relation drop.
isaaclab_arena/tests/test_env_gen_agent.py	Comprehensive unit tests for EnvGenAgent covering constructor probes, request shape, channel fallback, error propagation, and an opt-in live E2E test.
isaaclab_arena/tests/test_structured_output_utils.py	Thorough tests for schema munging, channel routing, and all failure modes of `check_structured_output_support`; the empty-choices test confirms the fix but the parallel gap in `ping` is untested.
.github/workflows/ci.yml	Adds `agent_remote_e2e` CI job with explicit secret-presence guard; also excludes the new marker from the default PhysX test run.
setup.py	Adds `openai` and `pydantic>=2.0` to `RUNTIME_DEPS`, addressing the previously-noted missing `pydantic` dependency.

Sequence Diagram

sequenceDiagram
    participant U as Caller
    participant A as EnvGenAgent init
    participant P as ping
    participant C as check_support
    participant G as generate_spec
    participant LLM as Inference Endpoint

    U->>A: EnvGenAgent(model, base_url)
    A->>P: ping(client, model)
    P->>LLM: completions.create max_tokens 8
    LLM-->>P: response text
    P-->>A: text
    A->>C: check_structured_output_support(client, model, EnvIntentSpec)
    C->>LLM: completions.create response_format json_schema
    LLM-->>C: EnvIntentSpec JSON sample
    C-->>A: StructuredOutputSupport supported True
    A-->>U: EnvGenAgent instance

    U->>G: generate_spec(prompt, catalog_text)
    G->>LLM: completions.create response_format json_schema
    LLM-->>G: JSON envelope
    G->>G: extract_response_text
    G->>G: json loads strict False
    G->>G: EnvIntentSpec model_validate
    G-->>U: EnvIntentSpec plus raw text

_{Reviews (9): Last reviewed commit: "Allow empty tasks list in EnvIntentSpec" | Re-trigger Greptile}

xyao-nv · 2026-05-27T19:51:22Z

+    registry = AssetRegistry()
+    backgrounds: list[str] = []
+    objects: list[dict] = []
+    embodiments: list[str] = []


How about optional ones, like lightings, hdr image?

xyao-nv · 2026-05-27T22:16:19Z

Much thorough than the last one. Most comments are to make it more robust and less prone to hallucinations. Feel free to leave some as TODOs.

And one major ask is to remove LLM, and call it Agent. Classes could be called as EnvGenAgent, EnvIntentSpec

xyao-nv flagged that the env-gen prompt and LLMEnvSpec docstrings mixed "atomic action" and "atomic task" interchangeably when referring to the entries of the ``tasks`` field. Unify on "atomic task(s)" everywhere since the field is ``tasks`` and the class is ``Task`` — saying "action" introduces a third noun for the same concept. Also aligns with the existing ``isaaclab_arena/tasks/sequential_task_base.py`` wording. Addresses thread #3313780564 on PR #718. Signed-off-by: Qian Lin <qianl@nvidia.com>

@zhx06

Two PR threads converged on the same 3-line comment, so addressing them together: * "Uniform spawn scale" misled by implying a 1.0 baseline — some object_library assets already carry intrinsic scaling, so the override is not strictly uniform. Drop the qualifier. * "auto-fit ... against the tabletop bbox" leaks implementation details of the current placement proposer; @zhx06 is expanding the proposer beyond bbox-based fitting, so the comment would go stale. Describe the effect (auto-fit) without the mechanism. Addresses threads #3313785736 and #3313791294 on PR #718. Signed-off-by: Qian Lin <qianl@nvidia.com>

The Resolver was moved out of this MR into a follow-up branch, so the three docstrings / comments in llm_schema.py that name-checked it either point at a class that is not in this PR (the Resolver class itself, Resolver._build_spatial_constraint) or duplicate contract text already covered by the surrounding class docstring. Removed: * Module docstring — drop "by the Resolver" from the second-step description; the architectural intent (deterministic second pass) reads fine without the class reference. * Relation class docstring — drop the trailing sentence cross-ref to Resolver._build_spatial_constraint; the binary-vs-unary contract is fully stated in the preceding sentences. * Relation.target inline comment — drop entirely; it restated the binary-vs-unary contract that the class docstring already documents authoritatively. Addresses thread #3313795576 on PR #718. Signed-off-by: Qian Lin <qianl@nvidia.com>

xyao-nv asked why the system prompt restated information that the schema could express structurally, and suggested moving rules into ``Field(description=...)`` so they live in one place instead of being duplicated across prompt prose and pydantic types. Rebalance: * Per-field semantics (what each field means, defaults, optionality) now live as ``Field(description=...)`` strings on every member of Item / Relation / Task / LLMEnvSpec. pydantic embeds these in ``model_json_schema()`` automatically, so the existing SCHEMA block at the end of the prompt now carries the per-field guidance the RULES section used to spell out. * Enum members no longer need explicit ``∈ {kinds}`` lines — they are already enforced by the ``Literal[...]`` types and show up as ``enum:`` arrays inside the schema JSON. * The prompt's RULES section shrinks to GUIDANCE: only cross-cutting invariants (articulated objects need spatial ``on(...)`` anchors; distractors need ``on(distractor, background)``), few-shot task examples, and the output-format directive remain — these either span multiple fields or change global LLM behaviour and don't fit a single-field description. * New anti-hallucination guidance (also from xyao-nv's review): the GUIDANCE block now tells the LLM to output null for unspecified optional fields instead of guessing, and the same nudge is echoed in the optional-field descriptions on ``instance_name``, ``scale``, and ``Task.target`` so the LLM sees it both globally and per-field. LLMAgent no longer imports RelationKind / TaskKind / get_args; the schema dump is the only mechanism by which enum members reach the prompt now. Net line count grows (+59) because Field descriptions are verbose, but the single source of truth for any given rule is now the schema — xyao-nv's explicit ask. Token cost in the actual API request stays roughly flat: what left the prompt mostly reappeared inside the JSON schema dump. Addresses thread #3313636935 on PR #718. Signed-off-by: Qian Lin <qianl@nvidia.com>

Add a required ``reasoning: str`` field as the FIRST entry of LLMEnvSpec. Instruction-tuned models respect schema field order, so listing it first forces the LLM to write a step-by-step analysis before committing to any structured field (the "think then commit" pattern, which measurably improves structured-output quality). As a bonus, the reasoning trace makes downstream failures debuggable: when the resolver drops an item or picks the wrong asset, the trace shows which step the model got wrong instead of leaving the malformed spec as the only signal. The Field description prescribes a 4-step analysis (task / foreground / background / distractors) and asks the model not to duplicate the analysis into ``task_description``. Verified end-to-end against the live endpoint: the model produces a self-contained 200-500 char reasoning block that walks all four steps, and ``task_description`` stays a one-sentence summary. Surface the field on its own line in ``try_schema.py`` so it's easy to spot when iterating on prompts. Signed-off-by: Qian Lin <qianl@nvidia.com>

Pure relocation requested in PR review: the package conceptually sits alongside the other env-construction code in isaaclab_arena/environments/, and the directory name shifts from llm_env_gen to agentic_env_gen to align with the broader "agent/agentic" terminology direction (the LLM is one possible backend; a VLM-driven agent could follow later). Files moved: * isaaclab_arena/llm_env_gen/{__init__,llm_agent,llm_schema,try_schema}.py -> isaaclab_arena/environments/agentic_env_gen/... Updated absolute import paths in try_schema.py (CLI module path in docstring + 2 imports) and test_llm_agent.py (docstring + 2 imports), plus comment-only path references in setup.py and .github/workflows/ci.yml. Strictly a directory move; class names (LLMAgent, LLMEnvSpec, LLMResponseParseError), file names within the package, the llm_remote_e2e pytest marker, and the test file name are all unchanged. Those renames follow in subsequent commits to keep each diff reviewable. Signed-off-by: Qian Lin <qianl@nvidia.com>

Adopt the name xyao-nv proposed in PR review: "Intent means it's a blueprint, comparing with raw response from agent. LLM is too restricted, AgenticEnvSpec sounds too broad." The class is the structured "env intent" the agent commits to after parsing a natural-language prompt — calling it an *intent* spec (rather than an *LLM* spec) keeps the name accurate when the backend is later swapped for a VLM or a hand-written rule-based agent. Renames: * class LLMEnvSpec -> EnvIntentSpec * file llm_schema.py -> env_intent_spec.py * test_embeds_llm_env_spec_schema -> test_embeds_env_intent_spec_schema Updates 24 references across llm_agent.py, try_schema.py, test_llm_agent.py, setup.py, and the renamed schema file itself (class definition, validator return annotation, docstrings, prompt templates, import statements, console labels, comment paths). Refreshes the historical comment that explained why the field initial_scene_graph was kept after the previous SceneSpec -> LLMEnvSpec rename; now reads SceneSpec -> LLMEnvSpec -> EnvIntentSpec so the rationale survives this rename too. LLMAgent, LLMResponseParseError, llm_agent.py, the llm_remote_e2e pytest marker, and test_llm_agent.py file name are deliberately NOT touched here — they follow in F8c (agent rename) and F8d (marker / test file / prose sweep) to keep each diff reviewable. Signed-off-by: Qian Lin <qianl@nvidia.com>

…ckage Continues the agentic terminology shift requested in PR review: "Let's just use agent/agentic instead of LLM all over the code." With this commit, the env-gen package surface no longer mentions "LLM" in any name — the backend stays LLM-driven today, but the contract is generic enough that a VLM- or rule-based agent could slot in without touching consumers. Class / file / marker / test-file renames: * class LLMAgent -> EnvGenAgent * class LLMResponseParseError -> AgentResponseParseError * file llm_agent.py -> env_gen_agent.py * file test_llm_agent.py -> test_env_gen_agent.py * mark llm_remote_e2e -> agent_remote_e2e * job test_llm_remote_e2e -> test_agent_remote_e2e AgentResponseParseError docstring rewritten to explicitly contrast with json.JSONDecodeError (addresses xyao's review question "AgenticResponseParseJSON error? Shall it be non overlapping with JSON's errors?"). Now spells out that: * JSONDecodeError = we found JSON but it's malformed * AgentResponseParseError = we couldn't even find a JSON payload …so callers can attribute failures correctly (retry vs. fix schema) without string-matching error messages. Prose sweep across the package (~60 sites): LLM -> agent in docstrings, comments, console banners, error messages, and CI step names. Three references in EnvGenAgent that describe the OpenAI chat-completions wire (constructor docstring, generate_spec docstring, temperature/max_tokens args) became "model" rather than "agent" because that's the layer the OpenAI client targets — the agent is the wrapper class. pytest.ini marker description tightened: "live OpenAI-compatible LLM endpoint" -> "live OpenAI-compatible chat-completions endpoint" (model-neutral, matches the wire contract). Signed-off-by: Qian Lin <qianl@nvidia.com>

Switch EnvGenAgent from prose-extraction to OpenAI-compatible structured outputs (response_format=json_schema). The wire now guarantees a JSON envelope matching EnvIntentSpec, so the agent drops: * _extract_json (markdown-fence stripping + brace-counting heuristic) * AgentResponseParseError (the "no JSON found" / "unbalanced braces" exception class is unreachable under structured outputs) * the "Emit ONLY the JSON object" instruction in the prompt * the prompt's embedded JSON schema dump (the wire carries it) Addresses xyao's review question "Can we leverage the structured outputs API?". Two endpoint quirks surfaced during validation against the requested models (nvidia/deepseek-ai/deepseek-v4-flash and aws/anthropic/bedrock-claude-opus-4-6) and are handled cleanly: * NVIDIA DeepSeek-v4-flash routes structured outputs into message.reasoning_content instead of message.content. The new extract_response_text() reads either channel transparently. * AWS Bedrock requires additionalProperties:false on every object node, which pydantic's default schema doesn't emit. The new build_strict_schema() deep-walks the schema and applies the strict-mode constraints (additionalProperties:false, every property required, drop default keys). DeepSeek-v4-flash also emits literal control chars (e.g. \t) inside JSON strings despite the structured-outputs contract. json.loads with strict=False accepts them; pydantic's stricter model_validate_json would reject. Documented inline. A new structured_output_utils module owns all wire-level diagnostics (pydantic-model-agnostic, BaseModel-typed): * ping(client, model) — cheap liveness probe (no response_format) * check_structured_output_support(client, model, spec_class) — full capability probe; returns a StructuredOutputSupport dataclass that separates api_error (4xx at the wire) from parse_error (JSON / schema validation), so deployment validators can attribute failures correctly without string-matching error messages. * build_strict_schema / apply_strict_constraints — schema munging * extract_response_text — channel fallback EnvGenAgent.__init__ now runs both probes in order: 1. ping (cheap, fails fast on dead wire / bad key) 2. check_structured_output_support (heavier; carries the EnvIntentSpec schema) A model that can't honour response_format raises RuntimeError at construction with the diagnostic payload in the message so the operator can attribute the cause (api_error vs parse_error vs route='empty'). No silently-broken agent instances. Tests: * 28 new unit tests in test_structured_output_utils.py covering schema munging (incl. an EnvIntentSpec-walk that confirms every object node is strict-mode-compatible), channel fallback, ping, and check_structured_output_support failure modes. * 16 unit tests in test_env_gen_agent.py rewritten for the new __init__ + generate_spec pipeline. Locks in: call order (ping then probe), early exit when ping fails (probe never fires), construction failure when probe returns unsupported (RuntimeError carries the diagnostic fields). * 2 live tests guarded by the agent_remote_e2e marker: test_default_model_supports_structured_output pins the capability contract against the default model; test_generate_spec_against_live_endpoint exercises the full pipeline. Verified end-to-end against the live nvidia/deepseek-ai/deepseek-v4-flash endpoint (parses the EnvIntentSpec out of reasoning_content) and the schema munging was independently confirmed against aws/anthropic/bedrock-claude-opus-4-6 (got past Bedrock's additionalProperties validator; geo-restricted from our test environment, but the wire path is sound). Signed-off-by: Qian Lin <qianl@nvidia.com>

Some providers (Azure content-filter trips, Bedrock guardrail rejections) succeed at the HTTP level but return an empty choices list. The naive ``resp.choices[0]`` access raised ``IndexError`` and broke the function's contract of always returning a ``StructuredOutputSupport`` value. Surface the condition as a ``parse_error`` with ``response_route="empty"`` so callers route it identically to the existing "empty envelope" case, but with a distinct message ("no choices") so operators can tell the two scenarios apart in logs. Signed-off-by: Qian Lin <qianl@nvidia.com>

Remove the ``_tasks_must_be_non_empty`` model validator so the agent can legitimately emit an empty ``tasks`` list when the user prompt describes a task-less scene (static playground / sandbox env). The arena layer treats empty tasks as a ``NoTask`` null object, so rejecting the empty list at the schema layer was a redundant guardrail that forced the agent to invent placeholder tasks for task-less prompts. Expand the ``tasks`` field description to teach the agent the new contract — empty list is valid, prefer it over a placeholder task — so behaviour doesn't regress now that the validator no longer enforces non-emptiness. Signed-off-by: Qian Lin <qianl@nvidia.com>

greptile-apps · 2026-05-28T10:15:23Z

+    resp = client.chat.completions.create(
+        model=model,
+        messages=[{"role": "user", "content": "Respond with exactly: OK"}],
+        temperature=0,
+        max_tokens=8,
+    )
+    return resp.choices[0].message.content or ""


ping() accesses resp.choices[0] unconditionally. Some API error paths (e.g. Azure content-filter trips, Bedrock guardrail rejections, and certain rate-limit responses) return HTTP 200 with an empty choices list, making this line raise IndexError — not an openai-client exception. That breaks the documented contract ("Raises: Any exception raised by the underlying openai client") and surfaces as an opaque crash from EnvGenAgent.__init__() rather than a diagnosable ping failure. check_structured_output_support in the same file was explicitly fixed in this PR to guard against this case; ping needs the same treatment.

Suggested change

resp = client.chat.completions.create(

model=model,

messages=[{"role": "user", "content": "Respond with exactly: OK"}],

temperature=0,

max_tokens=8,

)

return resp.choices[0].message.content or ""

resp = client.chat.completions.create(

model=model,

messages=[{"role": "user", "content": "Respond with exactly: OK"}],

temperature=0,

max_tokens=8,

)

choices = getattr(resp, "choices", None) or []

if not choices:

return ""

return choices[0].message.content or ""

qianl-nv added 7 commits May 26, 2026 18:39

Add LLMAgent code

0edda59

Add asset resolver

d4bdaa6

Resolver matches object/robot proposal from LLM (based on name/tag) to the exact entries in the arena regestry. It outputs an ArenaEnvGraphSpec with the nodes, tasks and initial state scene graph filled in based on LLM output + resolver match.

Move assert_unique_ids + assert_references_exist to post_init so it's…

3162272

… always called when loading from yaml/dict

Update schema to better support unary relation

06d4b0e

Simplify relation types in resolver

087c498

Fix copyright years

cf38ec3

Add unit test

7e5f58e

qianl-nv requested review from alexmillane, cvolkcvolk, peterd-NV, viiik-inside, xyao-nv and zhx06 as code owners May 26, 2026 14:40

isaaclab-review-bot Bot reviewed May 26, 2026

View reviewed changes

greptile-apps Bot reviewed May 26, 2026

View reviewed changes

Comment thread setup.py Outdated

Comment thread isaaclab_arena/llm_env_gen/llm_agent.py Outdated

xyao-nv reviewed May 26, 2026

View reviewed changes

Comment thread isaaclab_arena/environments/agentic_env_gen/env_gen_agent.py

xyao-nv reviewed May 26, 2026

View reviewed changes

Comment thread isaaclab_arena/llm_env_gen/schema.py Outdated

xyao-nv reviewed May 26, 2026

View reviewed changes

Comment thread isaaclab_arena/llm_env_gen/resolver.py Outdated

xyao-nv reviewed May 26, 2026

View reviewed changes

Comment thread isaaclab_arena/llm_env_gen/llm_agent.py Outdated

xyao-nv reviewed May 26, 2026

View reviewed changes

Comment thread isaaclab_arena/llm_env_gen/llm_agent.py Outdated

xyao-nv reviewed May 26, 2026

View reviewed changes

Comment thread isaaclab_arena/environments/agentic_env_gen/env_gen_agent.py

xyao-nv reviewed May 26, 2026

View reviewed changes

Comment thread isaaclab_arena/environments/agentic_env_gen/env_gen_agent.py

xyao-nv reviewed May 26, 2026

View reviewed changes

Comment thread isaaclab_arena/environments/agentic_env_gen/env_gen_agent.py

xyao-nv reviewed May 26, 2026

View reviewed changes

Comment thread isaaclab_arena/llm_env_gen/llm_agent.py Outdated

xyao-nv reviewed May 26, 2026

View reviewed changes

Comment thread isaaclab_arena/environments/agentic_env_gen/env_gen_agent.py

xyao-nv reviewed May 26, 2026

View reviewed changes

Comment thread isaaclab_arena/llm_env_gen/llm_agent.py Outdated

xyao-nv reviewed May 26, 2026

View reviewed changes

Comment thread isaaclab_arena/llm_env_gen/llm_agent.py Outdated

xyao-nv reviewed May 26, 2026

View reviewed changes

Comment thread isaaclab_arena/llm_env_gen/llm_agent.py Outdated

xyao-nv reviewed May 26, 2026

View reviewed changes

Comment thread isaaclab_arena/llm_env_gen/llm_agent.py Outdated