File size: 4,913 Bytes
0aed605 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 | ---
language: en
license: mit
tags:
- pose-estimation
- computer-vision
- keypoint-detection
- diffusion-models
- stable-diffusion
- out-of-distribution
- human-pose
- top-down-pose-estimation
- coco
- mmpose
library_name: pytorch
---
# SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation (Body - 17 Keypoints)
<div align="center">
[](https://arxiv.org/abs/2509.24980)
[](https://t-s-liang.github.io/SDPose)
[](https://huggingface.co/spaces/teemosliang/SDPose-Body)
[](https://opensource.org/licenses/MIT)
</div>
## Model Description
**SDPose** is a state-of-the-art human pose estimation model that leverages the powerful visual priors from **Stable Diffusion** to achieve exceptional performance on out-of-distribution (OOD) scenarios. This model variant estimates **17 COCO body keypoints** including nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles.
### Model Architecture
SDPose employs a **U-Net backbone** initialized with Stable Diffusion v2 weights, combined with a specialized heatmap head for keypoint prediction. The model operates in a top-down manner:
1. **Person Detection**: Detect human bounding boxes using an object detector (e.g., YOLO11-x)
2. **Pose Estimation**: Crop and estimate 17 body keypoints for each detected person
3. **Heatmap Generation**: Produce confidence heatmaps for precise keypoint estimation
**Model Specifications:**
- **Backbone**: Stable Diffusion v2 U-Net (fine-tuned; minimal architectural changes)
- **Head**: Custom heatmap prediction head
- **Input Resolution**: 1024×768 (H×W)
- **Output**: 17 keypoint heatmaps + coordinates with confidence scores
- **Framework**: MMPose
## Supported Keypoints (COCO Format)
The model predicts 17 body keypoints following the COCO keypoint format:
```
0: nose
1: left_eye
2: right_eye
3: left_ear
4: right_ear
5: left_shoulder
6: right_shoulder
7: left_elbow
8: right_elbow
9: left_wrist
10: right_wrist
11: left_hip
12: right_hip
13: left_knee
14: right_knee
15: left_ankle
16: right_ankle
```
## Intended Use
### Primary Use Cases
- Human pose estimation in natural images
- Pose estimation in artistic and stylized domains (paintings, anime, sketches)
- Animation and video pose tracking
- Cross-domain pose analysis and research
- Applications requiring robust pose estimation under distribution shifts
## How to Use
### Installation
```bash
# Clone the repository
git clone https://github.com/t-s-liang/SDPose-OOD.git
cd SDPose-OOD
# Install dependencies
pip install -r requirements.txt
# Download YOLO11-x for human detection
wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x.pt -P models/
# Launch Gradio interface
cd gradio_app
bash launch_gradio.sh
```
## Training Data
### Datasets
Trained exclusively on COCO-2017 train2017 (no extra data).
- **COCO (Common Objects in Context)**: 200K+ images with 17 body keypoints
### Preprocessing
- Images are resized and cropped to 1024×768 resolution
- Augmentation: random horizontal flip, half-body & bbox transforms, UDP affine; Albumentations (Gaussian/Median blur, coarse dropout).
- Heatmaps: UDP codec (MMPose style).
### Comparison with Baselines
SDPose significantly outperforms traditional pose estimation models (e.g., Sapiens, ViTPose++) on out-of-distribution benchmarks while maintaining competitive performance on in-domain data.
See our [paper](https://arxiv.org/abs/2509.24980) for comprehensive evaluation results.
## Citation
If you use SDPose in your research, please cite our paper:
```bibtex
@misc{liang2025sdposeexploitingdiffusionpriors,
title={SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation},
author={Shuang Liang and Jing He and Chuanmeizhi Wang and Lejun Liao and Guo Zhang and Yingcong Chen and Yuan Yuan},
year={2025},
eprint={2509.24980},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.24980},
}
```
## License
This model is released under the [MIT License](https://opensource.org/licenses/MIT).
## Additional Resources
- 🌐 **Project Website**: [https://t-s-liang.github.io/SDPose](https://t-s-liang.github.io/SDPose)
- 📄 **Paper**: [arXiv:2509.24980](https://arxiv.org/abs/2509.24980)
- 💻 **Code Repository**: [GitHub](https://github.com/t-s-liang/SDPose-OOD)
- 🤗 **Demo**: [HuggingFace Space](https://huggingface.co/spaces/teemosliang/SDPose-Body)
- 📧 **Contact**: tsliang2001@gmail.com
---
<div align="center">
**⭐ Star us on GitHub — it motivates us a lot!**
</div>
|