| --- |
| language: en |
| license: mit |
| tags: |
| - pose-estimation |
| - computer-vision |
| - keypoint-detection |
| - diffusion-models |
| - stable-diffusion |
| - out-of-distribution |
| - human-pose |
| - top-down-pose-estimation |
| - coco |
| - mmpose |
| library_name: pytorch |
| --- |
| |
| # SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation (Body - 17 Keypoints) |
|
|
| <div align="center"> |
|
|
| [](https://arxiv.org/abs/2509.24980) |
| [](https://t-s-liang.github.io/SDPose) |
| [](https://huggingface.co/spaces/teemosliang/SDPose-Body) |
| [](https://opensource.org/licenses/MIT) |
|
|
| </div> |
|
|
| ## Model Description |
|
|
| **SDPose** is a state-of-the-art human pose estimation model that leverages the powerful visual priors from **Stable Diffusion** to achieve exceptional performance on out-of-distribution (OOD) scenarios. This model variant estimates **17 COCO body keypoints** including nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles. |
|
|
| ### Model Architecture |
|
|
| SDPose employs a **U-Net backbone** initialized with Stable Diffusion v2 weights, combined with a specialized heatmap head for keypoint prediction. The model operates in a top-down manner: |
|
|
| 1. **Person Detection**: Detect human bounding boxes using an object detector (e.g., YOLO11-x) |
| 2. **Pose Estimation**: Crop and estimate 17 body keypoints for each detected person |
| 3. **Heatmap Generation**: Produce confidence heatmaps for precise keypoint estimation |
|
|
| **Model Specifications:** |
| - **Backbone**: Stable Diffusion v2 U-Net (fine-tuned; minimal architectural changes) |
| - **Head**: Custom heatmap prediction head |
| - **Input Resolution**: 1024×768 (H×W) |
| - **Output**: 17 keypoint heatmaps + coordinates with confidence scores |
| - **Framework**: MMPose |
|
|
| ## Supported Keypoints (COCO Format) |
|
|
| The model predicts 17 body keypoints following the COCO keypoint format: |
|
|
| ``` |
| 0: nose |
| 1: left_eye |
| 2: right_eye |
| 3: left_ear |
| 4: right_ear |
| 5: left_shoulder |
| 6: right_shoulder |
| 7: left_elbow |
| 8: right_elbow |
| 9: left_wrist |
| 10: right_wrist |
| 11: left_hip |
| 12: right_hip |
| 13: left_knee |
| 14: right_knee |
| 15: left_ankle |
| 16: right_ankle |
| ``` |
|
|
| ## Intended Use |
|
|
| ### Primary Use Cases |
|
|
| - Human pose estimation in natural images |
| - Pose estimation in artistic and stylized domains (paintings, anime, sketches) |
| - Animation and video pose tracking |
| - Cross-domain pose analysis and research |
| - Applications requiring robust pose estimation under distribution shifts |
|
|
| ## How to Use |
|
|
| ### Installation |
|
|
| ```bash |
| # Clone the repository |
| git clone https://github.com/t-s-liang/SDPose-OOD.git |
| cd SDPose-OOD |
| |
| # Install dependencies |
| pip install -r requirements.txt |
| # Download YOLO11-x for human detection |
| wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x.pt -P models/ |
| |
| # Launch Gradio interface |
| cd gradio_app |
| bash launch_gradio.sh |
| ``` |
|
|
| ## Training Data |
|
|
| ### Datasets |
|
|
| Trained exclusively on COCO-2017 train2017 (no extra data). |
|
|
| - **COCO (Common Objects in Context)**: 200K+ images with 17 body keypoints |
|
|
| ### Preprocessing |
|
|
| - Images are resized and cropped to 1024×768 resolution |
| - Augmentation: random horizontal flip, half-body & bbox transforms, UDP affine; Albumentations (Gaussian/Median blur, coarse dropout). |
| - Heatmaps: UDP codec (MMPose style). |
|
|
| ### Comparison with Baselines |
|
|
| SDPose significantly outperforms traditional pose estimation models (e.g., Sapiens, ViTPose++) on out-of-distribution benchmarks while maintaining competitive performance on in-domain data. |
|
|
| See our [paper](https://arxiv.org/abs/2509.24980) for comprehensive evaluation results. |
|
|
| ## Citation |
|
|
| If you use SDPose in your research, please cite our paper: |
|
|
| ```bibtex |
| @misc{liang2025sdposeexploitingdiffusionpriors, |
| title={SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation}, |
| author={Shuang Liang and Jing He and Chuanmeizhi Wang and Lejun Liao and Guo Zhang and Yingcong Chen and Yuan Yuan}, |
| year={2025}, |
| eprint={2509.24980}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CV}, |
| url={https://arxiv.org/abs/2509.24980}, |
| } |
| ``` |
|
|
| ## License |
|
|
| This model is released under the [MIT License](https://opensource.org/licenses/MIT). |
|
|
| ## Additional Resources |
|
|
| - 🌐 **Project Website**: [https://t-s-liang.github.io/SDPose](https://t-s-liang.github.io/SDPose) |
| - 📄 **Paper**: [arXiv:2509.24980](https://arxiv.org/abs/2509.24980) |
| - 💻 **Code Repository**: [GitHub](https://github.com/t-s-liang/SDPose-OOD) |
| - 🤗 **Demo**: [HuggingFace Space](https://huggingface.co/spaces/teemosliang/SDPose-Body) |
| - 📧 **Contact**: tsliang2001@gmail.com |
|
|
| --- |
|
|
| <div align="center"> |
|
|
| **⭐ Star us on GitHub — it motivates us a lot!** |
|
|
| </div> |
|
|