Framework for generating synthetic IMU based on Human Poses

Framework for generating synthetic Inertial Measurement Unit (IMU) data from human motion. It bridges two domains that are rarely connected: pose estimation and motion generation on one side, and wearable sensor simulation on the other. Given a video of a person moving, or a plain-text description of a movement, the framework produces synthetic accelerometer readings as if physical sensors had been attached to the body.

The primary motivation is the scarcity of labelled IMU datasets. Collecting real IMU data requires physical hardware, synchronized recordings, and careful placement of sensors — a process that is slow, expensive, and hard to scale. IMUGPT addresses this by deriving IMU signals from 3D skeletal trajectories, which can themselves be obtained either from video (via pose estimation) or generated from text prompts (via language-conditioned motion models). This makes it possible to produce large, diverse, and controllable datasets for training and evaluating activity recognition models without any physical sensor.

The framework integrates two main pipelines. The first takes a monocular video, lifts the detected 2D keypoints to a metric 3D skeleton using MotionBERT, and feeds the resulting trajectory into IMUSim to synthesize the sensor signals. The second pipeline uses a Vision-Language Model (Qwen3-VL) to automatically describe short windows of real motion and feeds those descriptions — together with real joint positions as anchors — into Kimodo, a SMPL-X motion generator, to produce novel but physically grounded synthetic motions.

This project was developed in a partnership between the Cognitive Architectures research line from the Hub for Artificial Intelligence and Cognitive Architectures (H.IAAC) from State University of Campinas (UNICAMP), Brazil; and the Robotics and Artificial Inteligence Lab (AIRLab), from Politecnico di Milano (POLIMI), Italy.

Repository Structure

pose_module/ — core pipeline: 2D/3D pose estimation, virtual IMU synthesis, MotionBERT lifting. Runs in the pose_module conda env.
robot_emotions_vlm/ — Qwen3-VL video description, anchor catalog construction, and Kimodo batch generation. Runs in the kimodo conda env.
kimodo/ — git submodule: SMPL-X motion generator CLI (kimodo_gen, kimodo_textencoder).
imusim/ — IMU physics simulation library used to synthesize accelerometer/gyroscope readings from 3D skeleton trajectories.
data/ — input datasets (e.g. data/RobotEmotions/).
output/ — per-clip outputs organized by experiment; manifests (JSONL) index all artifacts.
evaluation/ — notebooks and scripts for classifier experiments and IMU quality assessment.
scripts/ — utility scripts.

Dependencies / Requirements

Requirements: Linux, CUDA-capable GPU (≥ 21 GB VRAM recommended).

This project uses Miniconda to manage Python environments. We do not recommend python-env or venv, because the project requires multiple envs with different Python versions.

Make sure to install it accordingly to your Linux distribution. If not, follow the official instructions here.

Also, install ffmpeg for your distro. For Ubuntu/Debian, use:

sudo apt update && sudo apt install ffmpeg -y

Now, clone the project's repository. For using the necessary 3th-party codes, use the --recurse-submodules flag:

git clone --recurse-submodules git@github.com:H-IAAC/POSE2IMU-Framework.git
cd POSE2IMU-Framework

Installation / Usage

All environments (pose_module, openmmlab, kimodo) are configured by a single script:

sudo chmod +x config_envs.sh
bash config_envs.sh

The script creates and installs each conda environment in order, printing progress for each step. If any step fails it stops immediately.

Qwen3-VL weights (Qwen/Qwen3-VL-8B-Instruct) are downloaded automatically from Hugging Face on first use.

The project supports two main pipelines. All commands are run from the repository root.

Pipeline 1 — Video → Virtual IMU

Converts real video recordings into synthetic IMU data. The pose_module drives the full chain: 2D detection (OpenMMlab/ViTPose), 3D lifting (MotionBERT), metric normalization, root estimation, and physics-based IMU synthesis (IMUSim).

# Step 1 — Export 3D poses from video
conda run -n pose_module python -m pose_module.robot_emotions export-pose3d \
  --dataset-root data/RobotEmotions \
  --domains 10ms 30ms \
  --output-dir output/robot_emotions_pose3d \
  --env-name openmmlab \
  --no-debug-2d --no-debug-3d

# Step 2 — Synthesize virtual IMU signals, reusing the poses computed above
conda run -n pose_module python -m pose_module.robot_emotions export-virtual-imu \
  --dataset-root data/RobotEmotions \
  --domains 10ms 30ms \
  --output-dir output/robot_emotions_virtual_imu \
  --pose3d-manifest-path output/robot_emotions_pose3d/pose3d_manifest.jsonl \
  --no-debug-2d --no-debug-3d

Passing --pose3d-manifest-path skips the pose estimation stage for every clip found in the manifest and loads the existing pose3d.npz directly, going straight to IK + IMUSim. Clips not found in the manifest fall back to the full pipeline. Omitting the flag runs the full pipeline for all clips (original behaviour).

The pipeline exports raw (uncalibrated) signals. Calibration follows a rank-transform method that maps the virtual signal distribution onto the real IMU distribution of the same clip. There are two use cases:

Per fold during evaluation — evaluation/classifiers_pose_experiments.ipynb recalibrates each fold using only training-set subjects, avoiding data leakage.
Batch via CLI — calibrate-virtual-imu reads the manifest and calibrates each clip against its own imu.npz, writing virtual_imu_calibrated.npz alongside the original:

conda run -n pose_module python -m pose_module.robot_emotions calibrate-virtual-imu \
  --manifest-path output/robot_emotions_virtual_imu/virtual_imu_manifest.jsonl \
  --calibration-fraction 0.5

Flag	Default	Description
`--manifest-path`	required	`virtual_imu_manifest.jsonl` from `export-virtual-imu`.
`--calibration-fraction`	`1.0`	Fraction of each clip's real `imu.npz` to use as reference (e.g. `0.5` = first 50%).
`--activity-label-key`	`None`	Manifest label field for per-class calibration (e.g. `action`).
`--signal-mode`	`acc`	Channels to calibrate: `acc`, `gyro`, or `both`.
`--in-place`	off	Overwrite `virtual_imu.npz` instead of writing `virtual_imu_calibrated.npz`.

Outputs per clip under output/<experiment>/<clip_id>/:

pose/pose3d/pose3d.npz — 3D skeleton trajectory
imu/virtual_imu.npz — synthetic accelerometer + gyroscope (uncalibrated)
imu/virtual_imu_calibrated.npz — calibrated signal (when calibrate-virtual-imu is run without --in-place)

Pipeline 2 — Window-Anchored Kimodo Generation

Uses real video windows as anchors: Qwen3-VL describes each 5-second segment; Kimodo generates new SMPL-X motion conditioned on the text and real joint positions. The resulting synthetic motions are then converted to virtual IMU signals using the same simulation stages as Pipeline 1.

Step 1 — Export real pose3d (same as Pipeline 1, pose_module env):

conda run -n pose_module python -m pose_module.robot_emotions export-pose3d \
  --dataset-root data/RobotEmotions --domains 10ms 30ms \
  --output-dir output/robot_emotions_pose3d \
  --env-name openmmlab \
  --no-debug-2d --no-debug-3d

Step 2 — Describe windows with Qwen3-VL (kimodo env):

conda run -n kimodo python -m robot_emotions_vlm describe-windows \
  --pose3d-manifest-path output/robot_emotions_pose3d/pose3d_manifest.jsonl \
  --output-dir output/robot_emotions_qwen_windows \
  --window-sec 5.0 --window-hop-sec 2.5 --num-video-frames 48

Step 3 — Build anchor catalog (kimodo env):

Extracts ground trajectory and optional end-effector keyframes from the real pose3d to spatially constrain Kimodo generation.

conda run -n kimodo python -m robot_emotions_vlm build-anchor-catalog \
  --pose3d-manifest-path output/robot_emotions_pose3d/pose3d_manifest.jsonl \
  --qwen-window-catalog-path output/robot_emotions_qwen_windows/kimodo_window_prompt_catalog.jsonl \
  --output-dir output/robot_emotions_kimodo_anchors \
  --model Kimodo-SMPLX-RP-v1 \
  --effector-keyframes 5

Step 4 — Generate with Kimodo (kimodo env):

conda run -n kimodo python -m robot_emotions_vlm generate-kimodo \
  --model Kimodo-SMPLX-RP-v1 \
  --catalog-path output/robot_emotions_kimodo_anchors/kimodo_anchor_catalog.jsonl \
  --output-dir output/robot_emotions_kimodo

Each generated clip produces motion.npz + motion_amass.npz (SMPL-X) under output/robot_emotions_kimodo/<prompt_id>/.

Step 5 — Export virtual IMU (kimodo env):

Applies metric normalization → root estimation → IMUSim → geometric alignment. Percentile calibration is not applied here — it is deferred to evaluation time (per fold, restricted to training subjects) to avoid data leakage.

conda run -n kimodo python -m robot_emotions_vlm export-kimodo-virtual-imu \
  --kimodo-manifest output/robot_emotions_kimodo/kimodo_generation_manifest.jsonl \
  --output-dir output/robot_emotions_kimodo_imu \
  --real-imu-root output/robot_emotions_virtual_imu

Each clip produces virtual_imu.npz (geometrically aligned, pre-calibration) under output/robot_emotions_kimodo_imu/<prompt_id>/virtual_imu/.

Step 6 — Merge real + synthetic (kimodo env):

Combines the real-video manifest and the Kimodo manifest into a single JSONL for mixed training experiments.

conda run -n kimodo python -m robot_emotions_vlm export-mixed-virtual-imu \
  --real-manifest output/robot_emotions_virtual_imu/virtual_imu_manifest.jsonl \
  --synthetic-manifest output/robot_emotions_kimodo_imu/virtual_imu_manifest.jsonl \
  --output-dir output/robot_emotions_mixed_imu

Direct Kimodo generation (optional, `kimodo` env)

conda run -n kimodo kimodo_gen "A person sits down and stands up" \
  --model Kimodo-SMPLX-RP-v1 --duration 10.0 --output output/kimodo_direct/

Citation

@software{POSE2IMU,
author = {Parede, Henrique and Bonarini, Andrea and Dornhofer Paro Costa, Paula},
title = { POSE2IMU-Framework},
url = {https://github.com/H-IAAC/POSE2IMU-Framework}
}

Authors

(2026-) Henrique Parede: Computer Engineering student, FEEC-UNICAMP
(Advisor, 2026-) Andrea Bonarini: Professor, DEIB-POLIMI
(Advisor, 2026-) Paula Dornhofer Paro Costa: Professor, FEEC-UNICAMP

Acknowledgements

This study was financed by the São Paulo Research Foundation (FAPESP), Brasil. Process Number 2025/21964-5.

This codebase was developed starting from a fork of IMUGPT by Leng et al. We are grateful for their foundational work, which made this project possible.

We also gratefully acknowledge the authors of the following open-source projects, which are integrated as submodules:

Kimodo (NVIDIA) - SMPL-X motion generation conditioned on text and pose anchors.
ST-GCN (Sijie Yan et al.) - Spatial Temporal Graph Convolutional Networks for skeleton-based action recognition.
TS2Vec (Zhihan Yue et al.) - Unsupervised time-series representation learning.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
assets		assets
docs		docs
evaluation		evaluation
imusim		imusim
kimodo @ 1f05cd6		kimodo @ 1f05cd6
plots		plots
pose_module		pose_module
robot_emotions_vlm		robot_emotions_vlm
scripts		scripts
st-gcn @ 221c0e1		st-gcn @ 221c0e1
t2mgpt		t2mgpt
ts2vec @ b0088e1		ts2vec @ b0088e1
visualization		visualization
.gitignore		.gitignore
.gitmodules		.gitmodules
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
T2MGPT.md		T2MGPT.md
config_envs.sh		config_envs.sh
generate_citations.py		generate_citations.py
install_kimodo.md		install_kimodo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Framework for generating synthetic IMU based on Human Poses

Repository Structure

Dependencies / Requirements

Installation / Usage

Pipeline 1 — Video → Virtual IMU

Pipeline 2 — Window-Anchored Kimodo Generation

Direct Kimodo generation (optional, `kimodo` env)

Citation

Authors

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Framework for generating synthetic IMU based on Human Poses

Repository Structure

Dependencies / Requirements

Installation / Usage

Pipeline 1 — Video → Virtual IMU

Pipeline 2 — Window-Anchored Kimodo Generation

Direct Kimodo generation (optional, kimodo env)

Citation

Authors

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Direct Kimodo generation (optional, `kimodo` env)

Packages