Real-Time 3D Motion Prediction from 2D/3D Pose Landmarks
Transform BlazePose keypoints into smooth, accurate 3D skeletal motion using deep learning.
Blaze2Cap is a PyTorch-based motion prediction system that converts 2D/3D pose landmarks (from MediaPipe BlazePose) into full 3D skeletal motion with 22 joints. The model uses a Transformer architecture optimized for temporal consistency and motion smoothing.
Key Features:
- β Temporal Transformer - Causal self-attention for sequential motion prediction
- β Motion Smoothing - High smoothness loss weight for natural, jitter-free output
- β Mixed Precision Training - FP16 support for faster training on modern GPUs
- β L4 GPU Optimized - Configured for NVIDIA L4 (24GB VRAM)
- β Comprehensive Testing - Full test suite for models, data loaders, and loss functions
TotalCapture Dataset (Augmented)
- Input: BlazePose landmarks (25 keypoints Γ 7 channels β 18 features)
- Output: 3D skeletal motion (22 joints Γ 6D rotation representation)
- Samples: 5,164 total (2,775 train / 2,068 test)
- Augmentation: Temporal subsampling at multiple strides (60fps, 30fps, 20fps)
# Clone the repository
git clone https://github.com/BlazeWild/Blaze2Cap.git
cd Blaze2Cap
# Install dependencies
pip install -e .
# Or install manually
pip install torch torchvision torchaudio mediapipe==0.10.14 numpy tqdmDownload from Hugging Face:
The dataset (6.89GB) is hosted on Hugging Face for easier access:
# Install huggingface-hub
pip install huggingface-hub
# Download the dataset
huggingface-cli download Blazewild/Totalcap-blazepose --repo-type dataset --local-dir blaze2cap/dataset/Totalcapture_blazepose_preprocessed/DatasetExpected structure:
blaze2cap/dataset/Totalcapture_blazepose_preprocessed/Dataset/
βββ blaze_augmented/ # Input: BlazePose keypoints (5,164 samples)
βββ gt_augmented/ # Output: Ground truth motion (5,164 samples)
βββ dataset_map.json # Train/test split mapping
Generate dataset mapping (if not included in download):
python blaze2cap/data/generate_json.py# Run tests first
python -m test.test_model
python -m test.test_loss
# Start training
python -m tools.train# Evaluate best model
python -m test.evaluate --checkpoint ./checkpoints/best_model.pth
# Evaluate on specific split
python -m test.evaluate --checkpoint ./checkpoints/best_model.pth --split testInput: [Batch, Seq, 25, 18] # 25 joints Γ 18 features
β
Flatten & Project: [B, S, 256]
β
Positional Encoding
β
Transformer Encoder (4 layers)
- Causal Self-Attention
- Feed-Forward Network
β
Split-Head Decoder:
- Root Head: [B, S, 2, 6] # Position + Rotation deltas
- Body Head: [B, S, 20, 6] # 20 joint local rotations
β
Output: [B, S, 22, 6] # Combined 3D skeletal motion
Parameters: ~2.5M trainable parameters
L_total = Ξ»_rot Γ L_rotation + Ξ»_smooth Γ L_smoothness
# L_rotation: MSE between predicted and GT 6D rotations
# L_smoothness: MSE of velocity differences (penalizes jitter)L4 GPU Configuration:
Ξ»_rot = 1.0- Keep geometry groundedΞ»_smooth = 5.0- High weight for smooth motion
| Parameter | Value | Purpose |
|---|---|---|
batch_size |
512 | Maximize L4 throughput |
num_workers |
8 | Parallel data loading |
window_size |
64 | Larger temporal context |
d_model |
256 | Transformer hidden size |
num_layers |
4 | Transformer depth |
n_head |
4 | Multi-head attention |
lr |
1e-4 | Learning rate |
epochs |
100 | Training epochs |
use_amp |
True | Mixed precision (FP16) |
Edit tools/train.py to modify these settings.
Blaze2Cap/
βββ blaze2cap/
β βββ __init__.py # Package exports
β βββ data/
β β βββ data_loader.py # PoseSequenceDataset
β β βββ generate_json.py # Dataset map generator
β βββ modules/
β β βββ models.py # MotionTransformer
β βββ modeling/
β β βββ loss.py # MotionCorrectionLoss
β β βββ eval_motion.py # MPJPE/MARE metrics
β β βββ optimization.py # Optimizer configs
β βββ utils/
β β βββ checkpoint.py # Save/load checkpoints
β β βββ train_utils.py # Timer, CudaPreFetcher
β β βββ logging.py # Setup logging
β β βββ visualization.py # Render pose videos
β βββ dataset/
β βββ Totalcapture_blazepose_preprocessed/
β βββ Dataset/ # Training data
βββ tools/
β βββ train.py # Main training script
βββ test/
β βββ test_model.py # Model architecture tests
β βββ test_loss.py # Loss function tests
β βββ test_dataloader.py # Data loader tests
β βββ evaluate.py # Evaluation script
β βββ run_all.py # Run all tests
βββ pyproject.toml # Project metadata
βββ README.md # This file
# Test model architecture
python -m test.test_model
# Test loss functions
python -m test.test_loss
# Test data loader
python -m test.test_dataloader
# Run all tests
python -m test.run_allThe model is evaluated using:
- MPJPE (Mean Per Joint Position Error) - 3D position accuracy in mm
- MARE (Mean Absolute Rotation Error) - Rotation accuracy in radians
Reduce batch size in tools/train.py:
"batch_size": 256, # Reduce from 512Make sure to activate your virtual environment:
source venv/bin/activate # Linux/Mac
# or
conda activate your_envVerify dataset path and regenerate mapping:
ls -la blaze2cap/dataset/Totalcapture_blazepose_preprocessed/Dataset/
python blaze2cap/data/generate_json.pyIf you use this code in your research, please cite:
@software{blaze2cap2026,
author = {BlazeWild},
title = {Blaze2Cap: Real-Time 3D Motion Prediction from Pose Landmarks},
year = {2026},
url = {https://github.com/BlazeWild/Blaze2Cap}
}This project is licensed under the MIT License - see the LICENSE file for details.
- TotalCapture Dataset - For providing high-quality motion capture data
- MediaPipe BlazePose - For real-time pose estimation
- PyTorch Team - For the deep learning framework
For questions or issues, please open an issue on GitHub or contact the maintainer.
Repository: https://github.com/BlazeWild/Blaze2Cap
Built with β€οΈ for smooth, natural motion prediction