Summary
We'd like to contribute RoboGate, an adversarial pick-and-place benchmark suite, to the Isaac Lab-Arena ecosystem. The benchmark is designed to answer one question: "Is this learned policy safe to deploy on a real production line?"
Pull Request: #506
What RoboGate Adds
68 Adversarial Scenarios (4 Difficulty Tiers)
| Category |
Count |
Target SR |
Description |
| Nominal |
20 |
95-100% |
Standard objects, lighting, centered placement |
| Edge Cases |
15 |
70-85% |
Small/heavy/edge/occluded/transparent objects |
| Adversarial |
10 |
40-60% |
Low light, clutter, slippery, disturbances |
| Domain Randomization |
23 |
85-95% |
Lighting/color/position/camera variations |
5 Safety Metrics + Deployment Confidence Score (0-100)
| Metric |
Weight |
Threshold |
| Grasp Success Rate |
0.30 |
>= 92% |
| Collision Count |
0.25 |
== 0 |
| Cycle Time |
0.20 |
<= baseline × 1.1 |
| Drop Rate |
0.15 |
<= 3% |
| Grasp Miss Rate |
0.10 |
<= baseline × 1.2 |
4-Model VLA Leaderboard
| Model |
Params |
SR |
Confidence |
Failure Pattern |
| Scripted Controller (IK) |
— |
100% (68/68) |
76/100 |
— |
| GR00T N1.6 (NVIDIA) |
3B |
0% (0/68) |
1/100 |
grasp_miss + collision |
| OpenVLA (Stanford + TRI) |
7B |
0% (0/68) |
27/100 |
grasp_miss dominant, 0 collision |
| Octo-Base (UC Berkeley) |
93M |
0% (0/68) |
1/100 |
grasp_miss 79%, collision 21% |
| Octo-Small (UC Berkeley) |
27M |
0% (0/68) |
1/100 |
grasp_miss 79.4%, collision 20.6% |
Key finding: All 4 VLA models — including NVIDIA's official GR00T N1.6 (3B) — score 0% SR on scenarios a scripted IK controller solves 100%. The 100-point confidence gap is driven by training-deployment distribution mismatch, not model capacity.
30,000-Experiment Failure Dictionary
- Two-stage adaptive sampling (LHS + boundary-focused) across 8-dimensional parameter space
- Franka Panda (7-DOF) + UR5e (6-DOF), 30K total experiments
- Risk model AUC: 0.780, closed-form failure boundary equation
- 4 universal danger zones identified across both robot platforms
- Dataset: liveplex/robogate-failure-dictionary
Integration with Isaac Lab-Arena
The benchmark integrates with the existing Arena environment builder:
from robogate_benchmark.environments import RoboGateBenchmarkEnvironment
env_def = RoboGateBenchmarkEnvironment()
arena_env = env_def.get_env(args_cli)
It also supports --mock mode for CI/CD testing without GPU.
Links
We welcome feedback on the benchmark design, scenario coverage, or integration approach. Happy to adjust the PR based on maintainer guidance.
— AgentAI Co., Ltd.
Summary
We'd like to contribute RoboGate, an adversarial pick-and-place benchmark suite, to the Isaac Lab-Arena ecosystem. The benchmark is designed to answer one question: "Is this learned policy safe to deploy on a real production line?"
Pull Request: #506
What RoboGate Adds
68 Adversarial Scenarios (4 Difficulty Tiers)
5 Safety Metrics + Deployment Confidence Score (0-100)
4-Model VLA Leaderboard
Key finding: All 4 VLA models — including NVIDIA's official GR00T N1.6 (3B) — score 0% SR on scenarios a scripted IK controller solves 100%. The 100-point confidence gap is driven by training-deployment distribution mismatch, not model capacity.
30,000-Experiment Failure Dictionary
Integration with Isaac Lab-Arena
The benchmark integrates with the existing Arena environment builder:
It also supports
--mockmode for CI/CD testing without GPU.Links
We welcome feedback on the benchmark design, scenario coverage, or integration approach. Happy to adjust the PR based on maintainer guidance.
— AgentAI Co., Ltd.