Skip to content

kaist-shim/RethinkingDPO_Diffusion_Models

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intro

This is the official implementation of the paper Rethinking Direct Preference Optimization in Diffusion Models. This repository is adapted from the Diffusion-DPO official implementation.

image

Setup

pip install -r requirements.txt

Model Training

bash launchers/run_sd.sh

Algorithm Hyperparameters

  • --beta_dpo (implicit) KL-divergence parameter beta for DPO
  • REF_UPDATE_STEP Update period for the reference model
  • MONITOR_THRESHOLD Monitoring threshold for the KL divergence between the reference model and the pre-trained model
  • --timestep_gamma The timestep sampling distribution parameter $\gamma$
  • --reward_scale_scheduling To use the reward scale scheduling
  • --alpha The hyperparameter for the reward scale scheduling

Citation

@misc{kang2025rethinkingdirectpreferenceoptimization,
      title={Rethinking Direct Preference Optimization in Diffusion Models}, 
      author={Junyong Kang and Seohyun Lim and Kyungjune Baek and Hyunjung Shim},
      year={2025},
      eprint={2505.18736},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.18736}, 
}

About

[AAAI 2026] Official Implementation of the paper "Rethinking Direct Preference Optimization in Diffusion Models"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 98.1%
  • Shell 1.9%