README.md · teemosliang/SDPose-Body at main

File size: 4,913 Bytes

0aed605

---
language: en
license: mit
tags:
- pose-estimation
- computer-vision
- keypoint-detection
- diffusion-models
- stable-diffusion
- out-of-distribution
- human-pose
- top-down-pose-estimation
- coco
- mmpose
library_name: pytorch
---

# SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation (Body - 17 Keypoints)

<div align="center">

[![Paper](https://img.shields.io/badge/arXiv-Paper-b31b1b?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2509.24980)
[![Project Page](https://img.shields.io/badge/Project-Website-pink?logo=googlechrome&logoColor=white)](https://t-s-liang.github.io/SDPose)
[![HuggingFace Demo](https://img.shields.io/badge/🤗%20HuggingFace-Demo-yellow)](https://huggingface.co/spaces/teemosliang/SDPose-Body)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)

</div>

## Model Description

**SDPose** is a state-of-the-art human pose estimation model that leverages the powerful visual priors from **Stable Diffusion** to achieve exceptional performance on out-of-distribution (OOD) scenarios. This model variant estimates **17 COCO body keypoints** including nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles.

### Model Architecture

SDPose employs a **U-Net backbone** initialized with Stable Diffusion v2 weights, combined with a specialized heatmap head for keypoint prediction. The model operates in a top-down manner:

1. **Person Detection**: Detect human bounding boxes using an object detector (e.g., YOLO11-x)
2. **Pose Estimation**: Crop and estimate 17 body keypoints for each detected person
3. **Heatmap Generation**: Produce confidence heatmaps for precise keypoint estimation

**Model Specifications:**
- **Backbone**: Stable Diffusion v2 U-Net (fine-tuned; minimal architectural changes)
- **Head**: Custom heatmap prediction head
- **Input Resolution**: 1024×768 (H×W)
- **Output**: 17 keypoint heatmaps + coordinates with confidence scores
- **Framework**: MMPose

## Supported Keypoints (COCO Format)

The model predicts 17 body keypoints following the COCO keypoint format:

```
0: nose
1: left_eye
2: right_eye
3: left_ear
4: right_ear
5: left_shoulder
6: right_shoulder
7: left_elbow
8: right_elbow
9: left_wrist
10: right_wrist
11: left_hip
12: right_hip
13: left_knee
14: right_knee
15: left_ankle
16: right_ankle
```

## Intended Use

### Primary Use Cases

- Human pose estimation in natural images
- Pose estimation in artistic and stylized domains (paintings, anime, sketches)
- Animation and video pose tracking
- Cross-domain pose analysis and research
- Applications requiring robust pose estimation under distribution shifts

## How to Use

### Installation

```bash
# Clone the repository
git clone https://github.com/t-s-liang/SDPose-OOD.git
cd SDPose-OOD

# Install dependencies
pip install -r requirements.txt
# Download YOLO11-x for human detection
wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x.pt -P models/

# Launch Gradio interface
cd gradio_app
bash launch_gradio.sh
```

## Training Data

### Datasets

Trained exclusively on COCO-2017 train2017 (no extra data).

- **COCO (Common Objects in Context)**: 200K+ images with 17 body keypoints

### Preprocessing

- Images are resized and cropped to 1024×768 resolution
- Augmentation: random horizontal flip, half-body & bbox transforms, UDP affine; Albumentations (Gaussian/Median blur, coarse dropout).
- Heatmaps: UDP codec (MMPose style).

### Comparison with Baselines

SDPose significantly outperforms traditional pose estimation models (e.g., Sapiens, ViTPose++) on out-of-distribution benchmarks while maintaining competitive performance on in-domain data.

See our [paper](https://arxiv.org/abs/2509.24980) for comprehensive evaluation results.

## Citation

If you use SDPose in your research, please cite our paper:

```bibtex
@misc{liang2025sdposeexploitingdiffusionpriors,
      title={SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation}, 
      author={Shuang Liang and Jing He and Chuanmeizhi Wang and Lejun Liao and Guo Zhang and Yingcong Chen and Yuan Yuan},
      year={2025},
      eprint={2509.24980},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.24980}, 
}
```

## License

This model is released under the [MIT License](https://opensource.org/licenses/MIT).

## Additional Resources

- 🌐 **Project Website**: [https://t-s-liang.github.io/SDPose](https://t-s-liang.github.io/SDPose)
- 📄 **Paper**: [arXiv:2509.24980](https://arxiv.org/abs/2509.24980)
- 💻 **Code Repository**: [GitHub](https://github.com/t-s-liang/SDPose-OOD)
- 🤗 **Demo**: [HuggingFace Space](https://huggingface.co/spaces/teemosliang/SDPose-Body)
- 📧 **Contact**: tsliang2001@gmail.com

---

<div align="center">

**⭐ Star us on GitHub — it motivates us a lot!**

</div>