Reconstructing dynamic humans interacting with real-world environments from monocular videos is an important and challenging task. Despite considerable progress in 4D neural rendering, existing approaches either model dynamic scenes holistically or model scenes and backgrounds separately aim to introduce parametric human priors. However, these approaches either neglect distinct motion characteristics of various components in scene especially human, leading to incomplete reconstructions, or ignore the information exchange between the separately modeled components, resulting in spatial inconsistencies and visual artifacts at human-scene boundaries. To address this, we propose {\bf Separate-then-Map} (StM) strategy that introduces a dedicated information mapping mechanism to bridge separately defined and optimized models. Our method employs a shared transformation function for each Gaussian attribute to unify separately modeled components, enhancing computational efficiency by avoiding exhaustive pairwise interactions while ensuring spatial and visual coherence between humans and their surroundings. Extensive experiments on monocular video datasets demonstrate that StM significantly outperforms existing state-of-the-art methods in both visual quality and rendering accuracy, particularly at challenging human-scene interaction boundaries.
This repository uses two submodules. Clone recursively to fetch them:
If you already cloned without submodules:
git submodule update --init --recursiveSubmodules:
submodules/depth-diff-gaussian-rasterization— Differentiable Gaussian rasterization with depth forward/backward passsubmodules/simple-knn— K-nearest neighbors for Gaussian splatting
source scripts/conda_setup.shThis creates a conda environment, installs PyTorch, PyTorch3D, the submodules, and other dependencies.
Data setup follows apple/ml-hugs. Prepare the following:
- Register at the SMPL website
- Download v1.1.0 and SMPL UV obj file from the download page
- Extract and rename
basicModel_neutral_lbs_10_207_0_v1.0.0.pkltoSMPL_NEUTRAL.pkl - Place files in
./data/smpl/:
data/smpl/
├── SMPL_NEUTRAL.pkl
└── smpl_uv.obj
- Data: download
Or run the setup script from the project root:
source scripts/prepare_data_models.shUsed for rendering novel poses. Download SFU mocap (SMPL+H G) and MPI_mosh (SMPL+H G) subsets from AMASS.
Place the mocap data in ./data/.
After setup, your data/ folder should look like:
data/
├── smpl/
│ ├── SMPL_NEUTRAL.pkl
│ └── smpl_uv.obj
├── neuman/
│ └── dataset/
│ ├── bike
│ ├── citron
│ ├── jogging
│ ├── lab
│ ├── parkinglot
│ └── seattle
├── MPI_mosh/ # optional, for animation
│ ├── 00008
│ ├── 00031
│ └── ...
└── SFU/ # optional, for animation
├── 0005
├── 0007
└── ...
Joint human and scene training:
python main.py --cfg_file cfg_files/stm_human_scene.yaml human.loss.depth_w=0.03 output_path=output_stm_human_scene_depth_w0.03_seed10086python scripts/evaluate.py -o <path-to-output-directory>Prints PSNR, SSIM, and LPIPS metrics for a given pretrained model.
If you use this code, please cite the Dynamic Avatar-Scene Rendering from Human-centric Context paper:
@article{wang2025stm,
title={Dynamic Avatar-Scene Rendering from Human-centric Context},
author={Wang, Wenqing and Yang, Haosen and Kittler, Josef and Zhu, Xiatian},
journal={arXiv preprint arXiv:2511.10539},
year={2025},
url={https://arxiv.org/abs/2511.10539}
}