Skip to content

AryanMadhavVerma/full-stack-robot-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Full-Stack Robot Learning

A hands-on deep dive into the robotics intelligence stack, from physics simulation and closed-loop control through RL policy training, learned world models, and VLA (Vision-Language-Action) fine-tuning.

Robot: Franka Emika Panda (7-DOF arm) in MuJoCo simulation

Goal: Build real systems intuition across the full pipeline. Every phase includes deliberate failure analysis — understanding why things break is the point.

Repository Structure

control/           IK + Jacobian controller, failure mode analysis
rl/                SAC policy training (100% success)
worldmodels/       Learned dynamics model, MBPO attempt, rollout evaluation
vla/               Octo VLA: zero-shot eval, demo collection, fine-tuning
shared/            Camera renderer, common utilities

Each directory has its own README with setup instructions and learnings

Results

method params input training success
SAC (model-free RL) 78K state vector (20 floats) 300K env steps 100%
Octo zero-shot 93M image (overhead) 0 (pre-trained) 0%
Octo fine-tuned 93M image (overhead + wrist) 25K ft steps on 300 demos 90%

The fine-tuned Octo checkpoint had three simultaneous domain gaps from pre-training: MuJoCo sim renders (not real cameras), joint angle deltas (not end-effector actions, action head replaced entirely), and RL-generated demos (not teleoperation). Pre-trained spatial reasoning in the backbone transferred despite all three gaps.

Fine-tuned checkpoint: HuggingFace | Blog post: What Transfers When Nothing Matches

Setup

# Clone with MuJoCo model zoo
git clone https://github.com/AryanMadhavVerma/full-stack-robot-learning.git
cd full-stack-robot-learning
git clone https://github.com/google-deepmind/mujoco_menagerie.git

# Main environment (Python 3.13, for control/ rl/ worldmodels/)
python -m venv .venv
source .venv/bin/activate
pip install mujoco stable-baselines3[extra] torch gymnasium numpy

# VLA environment (Python 3.10 REQUIRED for Octo)
python3.10 -m venv .venv-octo
source .venv-octo/bin/activate
pip install --no-deps git+https://github.com/octo-models/octo.git
# See vla/README.md for full Octo dependency setup

macOS note: Use mjpython instead of python for any script that opens the MuJoCo viewer.

Execution Order

control/ --> rl/ --> worldmodels/ (parallel with vla/) --> vla/
  1. control/ — build the IK controller, understand why it fails
  2. rl/ — train SAC to solve the task the controller couldn't
  3. worldmodels/ — learn a dynamics model, measure compounding error, attempt MBPO
  4. vla/ — fine-tune a pre-trained VLA (Octo) on the same task, compare against RL

Octo / VLA Setup Notes

Octo's dependency chain is fragile. Key version pins that work together:

Python 3.10 (not 3.12+)
jax==0.4.20, jaxlib==0.4.20+cuda12.cudnn89
flax==0.7.5, optax==0.1.7, chex==0.1.85
tensorflow==2.15.0, tensorflow-probability==0.23.0
transformers==4.36.2 (v5 dropped Flax support)
scipy==1.11.4, numpy==1.26.4
nvidia-cudnn-cu12==8.9.7.29 (must match jaxlib's cudnn89)

GPU compatibility: JAX 0.4.20 supports up to compute capability 8.9 (RTX 4090). Does NOT work on RTX 50-series (compute capability 12.0+).

pip install octo installs the WRONG package (a plugin manager). Must install from GitHub:

pip install --no-deps git+https://github.com/octo-models/octo.git
pip install --no-deps git+https://github.com/kvablack/dlimp.git

About

training a franka panda arm to reach target using RL and testing positive transfer of actions finetuning Octo(VLA)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages