|
|
--- |
|
|
language: en |
|
|
license: mit |
|
|
tags: |
|
|
- pose-estimation |
|
|
- computer-vision |
|
|
- keypoint-detection |
|
|
- diffusion-models |
|
|
- stable-diffusion |
|
|
- out-of-distribution |
|
|
- human-pose |
|
|
- top-down-pose-estimation |
|
|
- coco |
|
|
- mmpose |
|
|
library_name: pytorch |
|
|
--- |
|
|
|
|
|
# SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation (Body - 17 Keypoints) |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
[](https://arxiv.org/abs/2509.24980) |
|
|
[](https://t-s-liang.github.io/SDPose) |
|
|
[](https://huggingface.co/spaces/teemosliang/SDPose-Body) |
|
|
[](https://opensource.org/licenses/MIT) |
|
|
|
|
|
</div> |
|
|
|
|
|
## Model Description |
|
|
|
|
|
**SDPose** is a state-of-the-art human pose estimation model that leverages the powerful visual priors from **Stable Diffusion** to achieve exceptional performance on out-of-distribution (OOD) scenarios. This model variant estimates **17 COCO body keypoints** including nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles. |
|
|
|
|
|
### Model Architecture |
|
|
|
|
|
SDPose employs a **U-Net backbone** initialized with Stable Diffusion v2 weights, combined with a specialized heatmap head for keypoint prediction. The model operates in a top-down manner: |
|
|
|
|
|
1. **Person Detection**: Detect human bounding boxes using an object detector (e.g., YOLO11-x) |
|
|
2. **Pose Estimation**: Crop and estimate 17 body keypoints for each detected person |
|
|
3. **Heatmap Generation**: Produce confidence heatmaps for precise keypoint estimation |
|
|
|
|
|
**Model Specifications:** |
|
|
- **Backbone**: Stable Diffusion v2 U-Net (fine-tuned; minimal architectural changes) |
|
|
- **Head**: Custom heatmap prediction head |
|
|
- **Input Resolution**: 1024×768 (H×W) |
|
|
- **Output**: 17 keypoint heatmaps + coordinates with confidence scores |
|
|
- **Framework**: MMPose |
|
|
|
|
|
## Supported Keypoints (COCO Format) |
|
|
|
|
|
The model predicts 17 body keypoints following the COCO keypoint format: |
|
|
|
|
|
``` |
|
|
0: nose |
|
|
1: left_eye |
|
|
2: right_eye |
|
|
3: left_ear |
|
|
4: right_ear |
|
|
5: left_shoulder |
|
|
6: right_shoulder |
|
|
7: left_elbow |
|
|
8: right_elbow |
|
|
9: left_wrist |
|
|
10: right_wrist |
|
|
11: left_hip |
|
|
12: right_hip |
|
|
13: left_knee |
|
|
14: right_knee |
|
|
15: left_ankle |
|
|
16: right_ankle |
|
|
``` |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
### Primary Use Cases |
|
|
|
|
|
- Human pose estimation in natural images |
|
|
- Pose estimation in artistic and stylized domains (paintings, anime, sketches) |
|
|
- Animation and video pose tracking |
|
|
- Cross-domain pose analysis and research |
|
|
- Applications requiring robust pose estimation under distribution shifts |
|
|
|
|
|
## How to Use |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
# Clone the repository |
|
|
git clone https://github.com/t-s-liang/SDPose-OOD.git |
|
|
cd SDPose-OOD |
|
|
|
|
|
# Install dependencies |
|
|
pip install -r requirements.txt |
|
|
# Download YOLO11-x for human detection |
|
|
wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x.pt -P models/ |
|
|
|
|
|
# Launch Gradio interface |
|
|
cd gradio_app |
|
|
bash launch_gradio.sh |
|
|
``` |
|
|
|
|
|
## Training Data |
|
|
|
|
|
### Datasets |
|
|
|
|
|
Trained exclusively on COCO-2017 train2017 (no extra data). |
|
|
|
|
|
- **COCO (Common Objects in Context)**: 200K+ images with 17 body keypoints |
|
|
|
|
|
### Preprocessing |
|
|
|
|
|
- Images are resized and cropped to 1024×768 resolution |
|
|
- Augmentation: random horizontal flip, half-body & bbox transforms, UDP affine; Albumentations (Gaussian/Median blur, coarse dropout). |
|
|
- Heatmaps: UDP codec (MMPose style). |
|
|
|
|
|
### Comparison with Baselines |
|
|
|
|
|
SDPose significantly outperforms traditional pose estimation models (e.g., Sapiens, ViTPose++) on out-of-distribution benchmarks while maintaining competitive performance on in-domain data. |
|
|
|
|
|
See our [paper](https://arxiv.org/abs/2509.24980) for comprehensive evaluation results. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use SDPose in your research, please cite our paper: |
|
|
|
|
|
```bibtex |
|
|
@misc{liang2025sdposeexploitingdiffusionpriors, |
|
|
title={SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation}, |
|
|
author={Shuang Liang and Jing He and Chuanmeizhi Wang and Lejun Liao and Guo Zhang and Yingcong Chen and Yuan Yuan}, |
|
|
year={2025}, |
|
|
eprint={2509.24980}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CV}, |
|
|
url={https://arxiv.org/abs/2509.24980}, |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the [MIT License](https://opensource.org/licenses/MIT). |
|
|
|
|
|
## Additional Resources |
|
|
|
|
|
- 🌐 **Project Website**: [https://t-s-liang.github.io/SDPose](https://t-s-liang.github.io/SDPose) |
|
|
- 📄 **Paper**: [arXiv:2509.24980](https://arxiv.org/abs/2509.24980) |
|
|
- 💻 **Code Repository**: [GitHub](https://github.com/t-s-liang/SDPose-OOD) |
|
|
- 🤗 **Demo**: [HuggingFace Space](https://huggingface.co/spaces/teemosliang/SDPose-Body) |
|
|
- 📧 **Contact**: tsliang2001@gmail.com |
|
|
|
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
**⭐ Star us on GitHub — it motivates us a lot!** |
|
|
|
|
|
</div> |
|
|
|