Skip to content

MVP-1: per-episode sensitivity recording + NPE analyzer#729

Draft
cvolkcvolk wants to merge 2 commits into
mainfrom
cvolk/feature/sensitivity_analysis_mvp1
Draft

MVP-1: per-episode sensitivity recording + NPE analyzer#729
cvolkcvolk wants to merge 2 commits into
mainfrom
cvolk/feature/sensitivity_analysis_mvp1

Conversation

@cvolkcvolk
Copy link
Copy Markdown
Collaborator

Summary

Per-episode JSONL writer in eval_runner + offline NPE analyzer; MVP-1 of the sensitivity-analysis workstream described in [robolab Analysis Tooling MNPE].

Detailed description

  • Why: turn eval results into actionable sensitivity insights — for a given outcome (e.g. success_rate) and an input factor (e.g. light_intensity), recover the posterior over the factor consistent with the observed outcomes. Lays the data-layer + analysis-layer foundations for the wider workstream; Alex's variation system replaces this PR's hand-authored upstream later (see comments in episode_writer.py and the inline factors.yaml).
  • What's changed: (a) opt-in per-episode writer (--factor_keys + --episode_summary) in eval_runner; (b) Job.arena_env_args_dict preserves the original args dict for in-process lookups; (c) isaaclab_arena.analysis.sensitivity package with episode_writer, dataset, analyzer, synthetic_data; (d) isaaclab_arena/scripts/analyze_sensitivity.py CLI driver; (e) sbi added to DEV_DEPS so the docker image picks it up on rebuild; (f) hand-authored jobs configs + factors.yaml for the light_intensity MVP sweep on the droid + pi0 setup.
  • Impact: existing eval flows are unchanged when the new flags are absent. With the flags, eval_runner appends one JSONL row per recorded demo (factor values from arena_env_args + outcomes from registered task metrics). Analyzer side is entirely offline.
  • Scope limits: MVP-1 = one 1D continuous factor; categorical and vector (dim > 1) branches raise NotImplementedError as reserved extension points. Binary outcomes trigger sbi's 1D-Gaussian fallback — surfaced as a runtime [WARN] so users see the caveat.
  • Validation: synthetic smoke test (Gaussian competence band centered on 500 → recovered peak ~550) plus 6-episode real-pi0 run (writer + JSONL + analyzer end-to-end, zero pxr-import regressions vs. main).

Opt-in writer (--factor_keys + --episode_summary) records the values of
the listed arena_env_args keys plus per-episode outcomes (from registered
task metrics) to a JSONL during eval_runner. Existing behavior is unchanged
when either flag is absent.

- Job.arena_env_args_dict preserves the original dict form alongside the
  existing CLI-args list so the writer can look up factor values by name
  without re-parsing the args.
- The writer's import is deferred inside the per-job try block, matching
  the policy_runner.py:107 pattern for pxr-touching modules (the writer
  pulls isaaclab_arena.metrics.metrics, which loads pxr at module top).
- Hand-authored factors.yaml + jobs configs check in alongside; --factor_keys
  on the CLI must match the factors.yaml the analyzer consumes (the analyzer
  validates the pairing on load).

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Reads paired factors.yaml + episode_summary.jsonl into the (theta, x, prior,
factor_columns) quadruple sbi consumes, trains NPE on a chosen outcome,
plots the 1D posterior marginal for a continuous factor. CLI driver at
isaaclab_arena/scripts/analyze_sensitivity.py.

- MVP-1 scope: one continuous 1D factor; categorical and vector (dim > 1)
  branches raise NotImplementedError so the extension point is reserved.
- Runtime [WARN] when fitting on a binary outcome surfaces sbi's 1D-Gaussian
  fallback caveat: the recovered peak reflects the empirical mean of
  successful theta values, not the true mode of the success curve.
- synthetic_data.py generates a paired JSONL + factors.yaml from a known
  competence band, letting the analyzer smoke-test end-to-end without sim.
- sbi added to DEV_DEPS so the docker dev install picks it up on rebuild.

Signed-off-by: Clemens Volk <cvolk@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant