--- language: en license: mit tags: - pose-estimation - computer-vision - keypoint-detection - diffusion-models - stable-diffusion - out-of-distribution - human-pose - top-down-pose-estimation - coco - mmpose library_name: pytorch --- # SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation (WholeBody - 133 Keypoints)
[![Paper](https://img.shields.io/badge/arXiv-Paper-b31b1b?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2509.24980) [![Project Page](https://img.shields.io/badge/Project-Website-pink?logo=googlechrome&logoColor=white)](https://t-s-liang.github.io/SDPose) [![HuggingFace Demo](https://img.shields.io/badge/🤗%20HuggingFace-Demo-yellow)](https://huggingface.co/spaces/teemosliang/SDPose-Body) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
## Model Description **SDPose** is a state-of-the-art human pose estimation model that leverages the powerful visual priors from **Stable Diffusion** to achieve exceptional performance on out-of-distribution (OOD) scenarios. This model variant estimates **133 wholebody keypoints,** including body, hands, face, feet. ### Model Architecture SDPose employs a **U-Net backbone** initialized with Stable Diffusion v2 weights, combined with a specialized heatmap head for keypoint prediction. The model operates in a top-down manner: 1. **Person Detection**: Detect human bounding boxes using an object detector (e.g., YOLO11-x) 2. **Pose Estimation**: Crop and estimate 17 body keypoints for each detected person 3. **Heatmap Generation**: Produce confidence heatmaps for precise keypoint estimation **Model Specifications:** - **Backbone**: Stable Diffusion v2 U-Net (fine-tuned; minimal architectural changes) - **Head**: Custom heatmap prediction head - **Input Resolution**: 1024×768 (H×W) - **Output**: 133 keypoint heatmaps + coordinates with confidence scores - **Framework**: MMPose ## Supported Keypoints (COCO Wholebody Format) The model predicts 133 body keypoints following the COCO Wholebody keypoint format.

## Intended Use - Human pose estimation in natural images - Pose estimation in artistic and stylized domains (paintings, anime, sketches) - Animation and video pose tracking - Cross-domain pose analysis and research - Applications requiring robust pose estimation under distribution shifts ## How to Use ### Installation ```bash # Clone the repository git clone https://github.com/t-s-liang/SDPose-OOD.git cd SDPose-OOD # Install dependencies pip install -r requirements.txt # Download YOLO11-x for human detection wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x.pt -P models/ # Launch Gradio interface cd gradio_app bash launch_gradio.sh ``` ## Training Data ### Datasets Trained exclusively on COCO-2017 train2017 (no extra data). - **COCO-Wholebody (Common Objects in Context)**: 200K+ images with 133 wholebody keypoints ### Preprocessing - Images are resized and cropped to 1024×768 resolution - Augmentation: random horizontal flip, half-body & bbox transforms, UDP affine; Albumentations (Gaussian/Median blur, coarse dropout). - Heatmaps: UDP codec (MMPose style). ### Comparison with Baselines SDPose significantly outperforms traditional pose estimation models (e.g., Sapiens) on out-of-distribution benchmarks while maintaining competitive performance on in-domain data. See our [paper](https://arxiv.org/abs/2509.24980) for comprehensive evaluation results. ## Citation If you use SDPose in your research, please cite our paper: ```bibtex @misc{liang2025sdposeexploitingdiffusionpriors, title={SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation}, author={Shuang Liang and Jing He and Chuanmeizhi Wang and Lejun Liao and Guo Zhang and Yingcong Chen and Yuan Yuan}, year={2025}, eprint={2509.24980}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2509.24980}, } ``` ## License This model is released under the [MIT License](https://opensource.org/licenses/MIT). ## Additional Resources - 🌐 **Project Website**: [https://t-s-liang.github.io/SDPose](https://t-s-liang.github.io/SDPose) - 📄 **Paper**: [arXiv:2509.24980](https://arxiv.org/abs/2509.24980) - 💻 **Code Repository**: [GitHub](https://github.com/t-s-liang/SDPose-OOD) - 🤗 **Demo**: [HuggingFace Space](https://huggingface.co/spaces/teemosliang/SDPose-Body) - 📧 **Contact**: tsliang2001@gmail.com ---
**⭐ Star us on GitHub — it motivates us a lot!**