Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis

Tianhe Wu^{* 1 3}, Ruibin Li^{* 2}, Lei Zhang^{2 3}, Kede Ma¹
¹City University of Hong Kong ²The Hong Kong Polytechnic University ³OPPO Research Institute

No perceptual backbone, no discriminator, no auxiliary networks, and no additional ground-truth images.

⚙️ Installation

To set up the environment, we recommend using Conda to manage dependencies. Follow these steps to get started for training:

conda create -n dpdmd python=3.10.16
conda activate dpdmd

pip install -e .

During training, your model will be evaluated using DINOv2, CLIP, ImageReward, and PickScore metrics, all of which are available in the installed dpdmd environment above.

😟 Attention1 [new env name: test_div]: DINOv3 requires transformers >= 4.57.0, which is incompatible with the ImageReward metric. Therefore, it is recommended to use DINOv2 during training. If you need to evaluate with DINOv3 after training, please create a separate conda environment and upgrade the transformers version accordingly.

😟 Attention2 [new env name: vq]: For visual quality evaluation, please follow VisualQuality-R1. After setup, install timm via pip install timm to enable the MANIQA metric. Creating a new environment for this step is very simple and recommended.

😊 Overall, three separate environments may be required: one for training and human preference evaluation (ImageReward, PickScore, DINOv2 and CLIP), one for visual quality evaluation (VisualQuality-R1 and MANIQA), and one for diversity evaluation (DINOv3 and CLIP). If DINOv3 is not used, only two environments are needed: one for training (including human preference evaluation) and one for visual quality evaluation.

⚡ Quick Inference

Run the following code to generate an image (Hugging Face model is trained SD3.5-M Transformer).

import torch
from diffusers import StableDiffusion3Pipeline

base_sd35_weight_path = "stable-diffusion-3.5-medium" # SD3.5-Medium weight path
transformer_weight_path = "DPDMD-SD35M-4NFE-natural.pt" # SD3.5-Medium Transformer weight path

pipe = StableDiffusion3Pipeline.from_pretrained(base_sd35_weight_path, torch_dtype=torch.bfloat16)
state_dict = torch.load(f"{transformer_weight_path}", map_location="cpu")
missing, unexpected = pipe.transformer.load_state_dict(state_dict, strict=True)
pipe = pipe.to("cuda:0")

g_init = torch.Generator(device="cuda:0").manual_seed(5)
image = pipe(
    "a dog",
    num_inference_steps=4,
    guidance_scale=1.0,
    height=1024,
    width=1024,
    generator=g_init
).images[0]

save_path = "./demo.png"
image.save(save_path)

🚀 Training

Starting the training process is very very easy. Please follow the three steps below.

Data Preparation

We only use text prompts for training. Example prompts can be found in the data/ folder (one text prompt per line). All prompts are stored in .txt format.

Pretrainined Model Preparation

Before starting training, you should first download the required files:

[SD3.5 Medium] stable-diffusion-3.5-medium
[PickScore processor] CLIP-ViT-H-14-laion2B-s32B-b79K
[PickScore] PickScore_v1
[ImageReward] ImageReward
[DINOv2] dinov2-base
[CLIP] clip-vit-large-patch14

Then you should modify the weight path in training script which is located at scripts/run_train_sd35.sh.

--teacher_id weights/stabilityai/stable-diffusion-3.5-medium \
--student_id weights/stabilityai/stable-diffusion-3.5-medium \
--fake_id weights/stabilityai/stable-diffusion-3.5-medium \
--pick_processor_path weights/CLIP-ViT-H-14-laion2B-s32B-b79K \
--pick_model_path weights/PickScore_v1 \
--ir_model_path weights/ImageReward/ImageReward.pt \
--ir_med_config weights/ImageReward/med_config.json \
--dino_path weights/dinov2-base \
--clip_path weights/clip-vit-large-patch14 \

Start Training

😟 Attention: When starting a training experiment, you should keep
sd35_dpdmd/sd35m_t30_1024_lr1e5_4nfe_anchor5 (example) consistent across the following arguments to ensure that all generated files are stored under the same root folder.

--log_path outputs/sd35_dpdmd/sd35m_t30_1024_lr1e5_4nfe_anchor5/log \
--ckpt_dir outputs/sd35_dpdmd/sd35m_t30_1024_lr1e5_4nfe_anchor5/ckpts \
--eval_dir outputs/sd35_dpdmd/sd35m_t30_1024_lr1e5_4nfe_anchor5/eval_images \
--process_folder_name outputs/sd35_dpdmd/sd35m_t30_1024_lr1e5_4nfe_anchor5/process_vis \
--diversity_folder_name outputs/sd35_dpdmd/sd35m_t30_1024_lr1e5_4nfe_anchor5/div_vis \

log_path: stores training log information.
ckpt_dir: stores checkpoint weights.
eval_dir: stores generated images used for human preference evaluation during training (overwritten at each evaluation step).
process_folder_name: stores student model output images during training (overwritten at each iteration).
diversity_folder_name: stores images used for diversity evaluation during training (overwritten at each evaluation step).

After completing all the preparations, run the following command to start training.

bash scripts/run_train_sd35.sh

🛠️ Testing

We provide the testing files for diversity evaluation (test_diversity.py), human preference evaluation (test_preference.py), and visual quality evaluation (test_quality.py). Please ensure that the required environments for each evaluation are installed beforehand.

Instructions for modifying paths or loading model weights are included within each file.

Human Preference: accelerate launch --main_process_port 29512 test_preference.py
Visual Quality: python test_quality.py
- VisualQuality-R1 weight
- MANIQA weight
Diversity: CUDA_VISIBLE_DEVICES=0 accelerate launch --main_process_port 29519 --num_processes 1 test_diversity.py

💪 Acknowledgement

I would like to sincerely thank Gongye Liu, Ke Lei (Tsinghua University), and Zhuoyan Luo for the generous support of my project and for the invaluable guidance in the field of generative modeling.

📧 Contact

If you have any question, please email tianhewu-c@my.cityu.edu.hk.

🌟 BibTeX

@article{wu2026diversity,
  title={Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis},
  author={Wu, Tianhe and Li, Ruibin and Zhang, Lei and Ma, Kede},
  journal={arXiv preprint arXiv:2602.03139},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for TianheWu/dpdmd

Base model

stabilityai/stable-diffusion-3.5-medium

Finetuned

(53)

this model

Dataset used to train TianheWu/dpdmd

Paper for TianheWu/dpdmd

Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis

Paper • 2602.03139 • Published 26 days ago • 41