MVP-1: per-episode sensitivity recording + NPE analyzer by cvolkcvolk · Pull Request #729 · isaac-sim/IsaacLab-Arena

cvolkcvolk · 2026-05-27T13:07:33Z

Summary

Per-episode JSONL writer in eval_runner + offline NPE analyzer; MVP-1 of the sensitivity-analysis workstream described in [robolab Analysis Tooling MNPE].

Detailed description

Why: turn eval results into actionable sensitivity insights — for a given outcome (e.g. success_rate) and an input factor (e.g. light_intensity), recover the posterior over the factor consistent with the observed outcomes. Lays the data-layer + analysis-layer foundations for the wider workstream; Alex's variation system replaces this PR's hand-authored upstream later (see comments in episode_writer.py and the inline factors.yaml).
What's changed: (a) opt-in per-episode writer (--factor_keys + --episode_summary) in eval_runner; (b) Job.arena_env_args_dict preserves the original args dict for in-process lookups; (c) isaaclab_arena.analysis.sensitivity package with episode_writer, dataset, analyzer, synthetic_data; (d) isaaclab_arena/scripts/analyze_sensitivity.py CLI driver; (e) sbi added to DEV_DEPS so the docker image picks it up on rebuild; (f) hand-authored jobs configs + factors.yaml for the light_intensity MVP sweep on the droid + pi0 setup.
Impact: existing eval flows are unchanged when the new flags are absent. With the flags, eval_runner appends one JSONL row per recorded demo (factor values from arena_env_args + outcomes from registered task metrics). Analyzer side is entirely offline.
Scope limits: MVP-1 = one 1D continuous factor; categorical and vector (dim > 1) branches raise NotImplementedError as reserved extension points. Binary outcomes trigger sbi's 1D-Gaussian fallback — surfaced as a runtime [WARN] so users see the caveat.
Validation: synthetic smoke test (Gaussian competence band centered on 500 → recovered peak ~550) plus 6-episode real-pi0 run (writer + JSONL + analyzer end-to-end, zero pxr-import regressions vs. main).

Opt-in writer (--factor_keys + --episode_summary) records the values of the listed arena_env_args keys plus per-episode outcomes (from registered task metrics) to a JSONL during eval_runner. Existing behavior is unchanged when either flag is absent. - Job.arena_env_args_dict preserves the original dict form alongside the existing CLI-args list so the writer can look up factor values by name without re-parsing the args. - The writer's import is deferred inside the per-job try block, matching the policy_runner.py:107 pattern for pxr-touching modules (the writer pulls isaaclab_arena.metrics.metrics, which loads pxr at module top). - Hand-authored factors.yaml + jobs configs check in alongside; --factor_keys on the CLI must match the factors.yaml the analyzer consumes (the analyzer validates the pairing on load). Signed-off-by: Clemens Volk <cvolk@nvidia.com>

Reads paired factors.yaml + episode_summary.jsonl into the (theta, x, prior, factor_columns) quadruple sbi consumes, trains NPE on a chosen outcome, plots the 1D posterior marginal for a continuous factor. CLI driver at isaaclab_arena/scripts/analyze_sensitivity.py. - MVP-1 scope: one continuous 1D factor; categorical and vector (dim > 1) branches raise NotImplementedError so the extension point is reserved. - Runtime [WARN] when fitting on a binary outcome surfaces sbi's 1D-Gaussian fallback caveat: the recovered peak reflects the empirical mean of successful theta values, not the true mode of the success curve. - synthetic_data.py generates a paired JSONL + factors.yaml from a known competence band, letting the analyzer smoke-test end-to-end without sim. - sbi added to DEV_DEPS so the docker dev install picks it up on rebuild. Signed-off-by: Clemens Volk <cvolk@nvidia.com>

cvolkcvolk added 2 commits May 27, 2026 15:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MVP-1: per-episode sensitivity recording + NPE analyzer#729

MVP-1: per-episode sensitivity recording + NPE analyzer#729
cvolkcvolk wants to merge 2 commits into
mainfrom
cvolk/feature/sensitivity_analysis_mvp1

cvolkcvolk commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cvolkcvolk commented May 27, 2026

Summary

Detailed description

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant