Dynamic Avatar-Scene Rendering from Human-centric Context

Abstract

Reconstructing dynamic humans interacting with real-world environments from monocular videos is an important and challenging task. Despite considerable progress in 4D neural rendering, existing approaches either model dynamic scenes holistically or model scenes and backgrounds separately aim to introduce parametric human priors. However, these approaches either neglect distinct motion characteristics of various components in scene especially human, leading to incomplete reconstructions, or ignore the information exchange between the separately modeled components, resulting in spatial inconsistencies and visual artifacts at human-scene boundaries. To address this, we propose {\bf Separate-then-Map} (StM) strategy that introduces a dedicated information mapping mechanism to bridge separately defined and optimized models. Our method employs a shared transformation function for each Gaussian attribute to unify separately modeled components, enhancing computational efficiency by avoiding exhaustive pairwise interactions while ensuring spatial and visual coherence between humans and their surroundings. Extensive experiments on monocular video datasets demonstrate that StM significantly outperforms existing state-of-the-art methods in both visual quality and rendering accuracy, particularly at challenging human-scene interaction boundaries.

Getting Started

Clone with Submodules

This repository uses two submodules. Clone recursively to fetch them:

If you already cloned without submodules:

git submodule update --init --recursive

Submodules:

submodules/depth-diff-gaussian-rasterization — Differentiable Gaussian rasterization with depth forward/backward pass
submodules/simple-knn — K-nearest neighbors for Gaussian splatting

Environment Setup

source scripts/conda_setup.sh

This creates a conda environment, installs PyTorch, PyTorch3D, the submodules, and other dependencies.

Datasets

Data setup follows apple/ml-hugs. Prepare the following:

1. SMPL Neutral Body Model

Register at the SMPL website
Download v1.1.0 and SMPL UV obj file from the download page
Extract and rename basicModel_neutral_lbs_10_207_0_v1.0.0.pkl to SMPL_NEUTRAL.pkl
Place files in ./data/smpl/:

data/smpl/
├── SMPL_NEUTRAL.pkl
└── smpl_uv.obj

2. NeuMan Dataset

Data: download

Or run the setup script from the project root:

source scripts/prepare_data_models.sh

3. AMASS Dataset (Optional, for Novel Animation)

Used for rendering novel poses. Download SFU mocap (SMPL+H G) and MPI_mosh (SMPL+H G) subsets from AMASS. Place the mocap data in ./data/.

Expected Data Structure

After setup, your data/ folder should look like:

data/
├── smpl/
│   ├── SMPL_NEUTRAL.pkl
│   └── smpl_uv.obj
├── neuman/
│   └── dataset/
│       ├── bike
│       ├── citron
│       ├── jogging
│       ├── lab
│       ├── parkinglot
│       └── seattle
├── MPI_mosh/          # optional, for animation
│   ├── 00008
│   ├── 00031
│   └── ...
└── SFU/               # optional, for animation
    ├── 0005
    ├── 0007
    └── ...

Training

Joint human and scene training:

python main.py --cfg_file cfg_files/stm_human_scene.yaml human.loss.depth_w=0.03 output_path=output_stm_human_scene_depth_w0.03_seed10086

Evaluation

python scripts/evaluate.py -o <path-to-output-directory>

Prints PSNR, SSIM, and LPIPS metrics for a given pretrained model.

Citation

If you use this code, please cite the Dynamic Avatar-Scene Rendering from Human-centric Context paper:

@article{wang2025stm,
    title={Dynamic Avatar-Scene Rendering from Human-centric Context},
    author={Wang, Wenqing and Yang, Haosen and Kittler, Josef and Zhu, Xiatian},
    journal={arXiv preprint arXiv:2511.10539},
    year={2025},
    url={https://arxiv.org/abs/2511.10539}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
StM		StM
cfg_files		cfg_files
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamic Avatar-Scene Rendering from Human-centric Context

Abstract

Getting Started

Clone with Submodules

Environment Setup

Datasets

1. SMPL Neutral Body Model

2. NeuMan Dataset

3. AMASS Dataset (Optional, for Novel Animation)

Expected Data Structure

Training

Evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dynamic Avatar-Scene Rendering from Human-centric Context

Abstract

Getting Started

Clone with Submodules

Environment Setup

Datasets

1. SMPL Neutral Body Model

2. NeuMan Dataset

3. AMASS Dataset (Optional, for Novel Animation)

Expected Data Structure

Training

Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages