A hands-on deep dive into the robotics intelligence stack, from physics simulation and closed-loop control through RL policy training, learned world models, and VLA (Vision-Language-Action) fine-tuning.
Robot: Franka Emika Panda (7-DOF arm) in MuJoCo simulation
Goal: Build real systems intuition across the full pipeline. Every phase includes deliberate failure analysis — understanding why things break is the point.
control/ IK + Jacobian controller, failure mode analysis
rl/ SAC policy training (100% success)
worldmodels/ Learned dynamics model, MBPO attempt, rollout evaluation
vla/ Octo VLA: zero-shot eval, demo collection, fine-tuning
shared/ Camera renderer, common utilities
Each directory has its own README with setup instructions and learnings
| method | params | input | training | success |
|---|---|---|---|---|
| SAC (model-free RL) | 78K | state vector (20 floats) | 300K env steps | 100% |
| Octo zero-shot | 93M | image (overhead) | 0 (pre-trained) | 0% |
| Octo fine-tuned | 93M | image (overhead + wrist) | 25K ft steps on 300 demos | 90% |
The fine-tuned Octo checkpoint had three simultaneous domain gaps from pre-training: MuJoCo sim renders (not real cameras), joint angle deltas (not end-effector actions, action head replaced entirely), and RL-generated demos (not teleoperation). Pre-trained spatial reasoning in the backbone transferred despite all three gaps.
Fine-tuned checkpoint: HuggingFace | Blog post: What Transfers When Nothing Matches
# Clone with MuJoCo model zoo
git clone https://github.com/AryanMadhavVerma/full-stack-robot-learning.git
cd full-stack-robot-learning
git clone https://github.com/google-deepmind/mujoco_menagerie.git
# Main environment (Python 3.13, for control/ rl/ worldmodels/)
python -m venv .venv
source .venv/bin/activate
pip install mujoco stable-baselines3[extra] torch gymnasium numpy
# VLA environment (Python 3.10 REQUIRED for Octo)
python3.10 -m venv .venv-octo
source .venv-octo/bin/activate
pip install --no-deps git+https://github.com/octo-models/octo.git
# See vla/README.md for full Octo dependency setupmacOS note: Use mjpython instead of python for any script that opens the MuJoCo viewer.
control/ --> rl/ --> worldmodels/ (parallel with vla/) --> vla/
- control/ — build the IK controller, understand why it fails
- rl/ — train SAC to solve the task the controller couldn't
- worldmodels/ — learn a dynamics model, measure compounding error, attempt MBPO
- vla/ — fine-tune a pre-trained VLA (Octo) on the same task, compare against RL
Octo's dependency chain is fragile. Key version pins that work together:
Python 3.10 (not 3.12+)
jax==0.4.20, jaxlib==0.4.20+cuda12.cudnn89
flax==0.7.5, optax==0.1.7, chex==0.1.85
tensorflow==2.15.0, tensorflow-probability==0.23.0
transformers==4.36.2 (v5 dropped Flax support)
scipy==1.11.4, numpy==1.26.4
nvidia-cudnn-cu12==8.9.7.29 (must match jaxlib's cudnn89)
GPU compatibility: JAX 0.4.20 supports up to compute capability 8.9 (RTX 4090). Does NOT work on RTX 50-series (compute capability 12.0+).
pip install octo installs the WRONG package (a plugin manager). Must install from GitHub:
pip install --no-deps git+https://github.com/octo-models/octo.git
pip install --no-deps git+https://github.com/kvablack/dlimp.git