Skip to content

BlazeWild/Blaze2Cap_AI_Motioner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

39 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Blaze2Cap

Real-Time 3D Motion Prediction from 2D/3D Pose Landmarks

Transform BlazePose keypoints into smooth, accurate 3D skeletal motion using deep learning.


🎯 Overview

Blaze2Cap is a PyTorch-based motion prediction system that converts 2D/3D pose landmarks (from MediaPipe BlazePose) into full 3D skeletal motion with 22 joints. The model uses a Transformer architecture optimized for temporal consistency and motion smoothing.

Key Features:

  • βœ… Temporal Transformer - Causal self-attention for sequential motion prediction
  • βœ… Motion Smoothing - High smoothness loss weight for natural, jitter-free output
  • βœ… Mixed Precision Training - FP16 support for faster training on modern GPUs
  • βœ… L4 GPU Optimized - Configured for NVIDIA L4 (24GB VRAM)
  • βœ… Comprehensive Testing - Full test suite for models, data loaders, and loss functions

πŸ“Š Dataset

TotalCapture Dataset (Augmented)

  • Input: BlazePose landmarks (25 keypoints Γ— 7 channels β†’ 18 features)
  • Output: 3D skeletal motion (22 joints Γ— 6D rotation representation)
  • Samples: 5,164 total (2,775 train / 2,068 test)
  • Augmentation: Temporal subsampling at multiple strides (60fps, 30fps, 20fps)

πŸš€ Quick Start

Installation

# Clone the repository
git clone https://github.com/BlazeWild/Blaze2Cap.git
cd Blaze2Cap

# Install dependencies
pip install -e .

# Or install manually
pip install torch torchvision torchaudio mediapipe==0.10.14 numpy tqdm

Dataset Setup

Download from Hugging Face:

The dataset (6.89GB) is hosted on Hugging Face for easier access:

# Install huggingface-hub
pip install huggingface-hub

# Download the dataset
huggingface-cli download Blazewild/Totalcap-blazepose --repo-type dataset --local-dir blaze2cap/dataset/Totalcapture_blazepose_preprocessed/Dataset

Expected structure:

blaze2cap/dataset/Totalcapture_blazepose_preprocessed/Dataset/
β”œβ”€β”€ blaze_augmented/     # Input: BlazePose keypoints (5,164 samples)
β”œβ”€β”€ gt_augmented/        # Output: Ground truth motion (5,164 samples)
└── dataset_map.json     # Train/test split mapping

Generate dataset mapping (if not included in download):

python blaze2cap/data/generate_json.py

Training

# Run tests first
python -m test.test_model
python -m test.test_loss

# Start training
python -m tools.train

Evaluation

# Evaluate best model
python -m test.evaluate --checkpoint ./checkpoints/best_model.pth

# Evaluate on specific split
python -m test.evaluate --checkpoint ./checkpoints/best_model.pth --split test

πŸ—οΈ Architecture

Model: MotionTransformer

Input: [Batch, Seq, 25, 18]  # 25 joints Γ— 18 features
  ↓
Flatten & Project: [B, S, 256]
  ↓
Positional Encoding
  ↓
Transformer Encoder (4 layers)
  - Causal Self-Attention
  - Feed-Forward Network
  ↓
Split-Head Decoder:
  - Root Head: [B, S, 2, 6]   # Position + Rotation deltas
  - Body Head: [B, S, 20, 6]  # 20 joint local rotations
  ↓
Output: [B, S, 22, 6]  # Combined 3D skeletal motion

Parameters: ~2.5M trainable parameters

Loss Function

L_total = Ξ»_rot Γ— L_rotation + Ξ»_smooth Γ— L_smoothness

# L_rotation: MSE between predicted and GT 6D rotations
# L_smoothness: MSE of velocity differences (penalizes jitter)

L4 GPU Configuration:

  • Ξ»_rot = 1.0 - Keep geometry grounded
  • Ξ»_smooth = 5.0 - High weight for smooth motion

βš™οΈ Configuration

Hyperparameters (L4 GPU Optimized)

Parameter Value Purpose
batch_size 512 Maximize L4 throughput
num_workers 8 Parallel data loading
window_size 64 Larger temporal context
d_model 256 Transformer hidden size
num_layers 4 Transformer depth
n_head 4 Multi-head attention
lr 1e-4 Learning rate
epochs 100 Training epochs
use_amp True Mixed precision (FP16)

Edit tools/train.py to modify these settings.


πŸ“ Project Structure

Blaze2Cap/
β”œβ”€β”€ blaze2cap/
β”‚   β”œβ”€β”€ __init__.py              # Package exports
β”‚   β”œβ”€β”€ data/
β”‚   β”‚   β”œβ”€β”€ data_loader.py       # PoseSequenceDataset
β”‚   β”‚   └── generate_json.py     # Dataset map generator
β”‚   β”œβ”€β”€ modules/
β”‚   β”‚   └── models.py            # MotionTransformer
β”‚   β”œβ”€β”€ modeling/
β”‚   β”‚   β”œβ”€β”€ loss.py              # MotionCorrectionLoss
β”‚   β”‚   β”œβ”€β”€ eval_motion.py       # MPJPE/MARE metrics
β”‚   β”‚   └── optimization.py      # Optimizer configs
β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   β”œβ”€β”€ checkpoint.py        # Save/load checkpoints
β”‚   β”‚   β”œβ”€β”€ train_utils.py       # Timer, CudaPreFetcher
β”‚   β”‚   β”œβ”€β”€ logging.py           # Setup logging
β”‚   β”‚   └── visualization.py     # Render pose videos
β”‚   └── dataset/
β”‚       └── Totalcapture_blazepose_preprocessed/
β”‚           └── Dataset/         # Training data
β”œβ”€β”€ tools/
β”‚   └── train.py                 # Main training script
β”œβ”€β”€ test/
β”‚   β”œβ”€β”€ test_model.py            # Model architecture tests
β”‚   β”œβ”€β”€ test_loss.py             # Loss function tests
β”‚   β”œβ”€β”€ test_dataloader.py       # Data loader tests
β”‚   β”œβ”€β”€ evaluate.py              # Evaluation script
β”‚   └── run_all.py               # Run all tests
β”œβ”€β”€ pyproject.toml               # Project metadata
└── README.md                    # This file

πŸ§ͺ Testing

# Test model architecture
python -m test.test_model

# Test loss functions
python -m test.test_loss

# Test data loader
python -m test.test_dataloader

# Run all tests
python -m test.run_all

πŸ“ˆ Metrics

The model is evaluated using:

  • MPJPE (Mean Per Joint Position Error) - 3D position accuracy in mm
  • MARE (Mean Absolute Rotation Error) - Rotation accuracy in radians

πŸ”§ Troubleshooting

CUDA Out of Memory

Reduce batch size in tools/train.py:

"batch_size": 256,  # Reduce from 512

Import Errors

Make sure to activate your virtual environment:

source venv/bin/activate  # Linux/Mac
# or
conda activate your_env

Dataset Not Found

Verify dataset path and regenerate mapping:

ls -la blaze2cap/dataset/Totalcapture_blazepose_preprocessed/Dataset/
python blaze2cap/data/generate_json.py

πŸ“ Citation

If you use this code in your research, please cite:

@software{blaze2cap2026,
  author = {BlazeWild},
  title = {Blaze2Cap: Real-Time 3D Motion Prediction from Pose Landmarks},
  year = {2026},
  url = {https://github.com/BlazeWild/Blaze2Cap}
}

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • TotalCapture Dataset - For providing high-quality motion capture data
  • MediaPipe BlazePose - For real-time pose estimation
  • PyTorch Team - For the deep learning framework

πŸ“§ Contact

For questions or issues, please open an issue on GitHub or contact the maintainer.

Repository: https://github.com/BlazeWild/Blaze2Cap


Built with ❀️ for smooth, natural motion prediction

About

3D Human Pose Estimation: BlazePose to TotalCapture Motion Dataset Pipeline with PyTorch DataLoader for motion capture research and machine learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages