teemosliang commited on
Commit
0aed605
·
verified ·
1 Parent(s): 26bf332

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +155 -3
README.md CHANGED
@@ -1,3 +1,155 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - pose-estimation
6
+ - computer-vision
7
+ - keypoint-detection
8
+ - diffusion-models
9
+ - stable-diffusion
10
+ - out-of-distribution
11
+ - human-pose
12
+ - top-down-pose-estimation
13
+ - coco
14
+ - mmpose
15
+ library_name: pytorch
16
+ ---
17
+
18
+ # SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation (Body - 17 Keypoints)
19
+
20
+ <div align="center">
21
+
22
+ [![Paper](https://img.shields.io/badge/arXiv-Paper-b31b1b?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2509.24980)
23
+ [![Project Page](https://img.shields.io/badge/Project-Website-pink?logo=googlechrome&logoColor=white)](https://t-s-liang.github.io/SDPose)
24
+ [![HuggingFace Demo](https://img.shields.io/badge/🤗%20HuggingFace-Demo-yellow)](https://huggingface.co/spaces/teemosliang/SDPose-Body)
25
+ [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
26
+
27
+ </div>
28
+
29
+ ## Model Description
30
+
31
+ **SDPose** is a state-of-the-art human pose estimation model that leverages the powerful visual priors from **Stable Diffusion** to achieve exceptional performance on out-of-distribution (OOD) scenarios. This model variant estimates **17 COCO body keypoints** including nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles.
32
+
33
+ ### Model Architecture
34
+
35
+ SDPose employs a **U-Net backbone** initialized with Stable Diffusion v2 weights, combined with a specialized heatmap head for keypoint prediction. The model operates in a top-down manner:
36
+
37
+ 1. **Person Detection**: Detect human bounding boxes using an object detector (e.g., YOLO11-x)
38
+ 2. **Pose Estimation**: Crop and estimate 17 body keypoints for each detected person
39
+ 3. **Heatmap Generation**: Produce confidence heatmaps for precise keypoint estimation
40
+
41
+ **Model Specifications:**
42
+ - **Backbone**: Stable Diffusion v2 U-Net (fine-tuned; minimal architectural changes)
43
+ - **Head**: Custom heatmap prediction head
44
+ - **Input Resolution**: 1024×768 (H×W)
45
+ - **Output**: 17 keypoint heatmaps + coordinates with confidence scores
46
+ - **Framework**: MMPose
47
+
48
+ ## Supported Keypoints (COCO Format)
49
+
50
+ The model predicts 17 body keypoints following the COCO keypoint format:
51
+
52
+ ```
53
+ 0: nose
54
+ 1: left_eye
55
+ 2: right_eye
56
+ 3: left_ear
57
+ 4: right_ear
58
+ 5: left_shoulder
59
+ 6: right_shoulder
60
+ 7: left_elbow
61
+ 8: right_elbow
62
+ 9: left_wrist
63
+ 10: right_wrist
64
+ 11: left_hip
65
+ 12: right_hip
66
+ 13: left_knee
67
+ 14: right_knee
68
+ 15: left_ankle
69
+ 16: right_ankle
70
+ ```
71
+
72
+ ## Intended Use
73
+
74
+ ### Primary Use Cases
75
+
76
+ - Human pose estimation in natural images
77
+ - Pose estimation in artistic and stylized domains (paintings, anime, sketches)
78
+ - Animation and video pose tracking
79
+ - Cross-domain pose analysis and research
80
+ - Applications requiring robust pose estimation under distribution shifts
81
+
82
+ ## How to Use
83
+
84
+ ### Installation
85
+
86
+ ```bash
87
+ # Clone the repository
88
+ git clone https://github.com/t-s-liang/SDPose-OOD.git
89
+ cd SDPose-OOD
90
+
91
+ # Install dependencies
92
+ pip install -r requirements.txt
93
+ # Download YOLO11-x for human detection
94
+ wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x.pt -P models/
95
+
96
+ # Launch Gradio interface
97
+ cd gradio_app
98
+ bash launch_gradio.sh
99
+ ```
100
+
101
+ ## Training Data
102
+
103
+ ### Datasets
104
+
105
+ Trained exclusively on COCO-2017 train2017 (no extra data).
106
+
107
+ - **COCO (Common Objects in Context)**: 200K+ images with 17 body keypoints
108
+
109
+ ### Preprocessing
110
+
111
+ - Images are resized and cropped to 1024×768 resolution
112
+ - Augmentation: random horizontal flip, half-body & bbox transforms, UDP affine; Albumentations (Gaussian/Median blur, coarse dropout).
113
+ - Heatmaps: UDP codec (MMPose style).
114
+
115
+ ### Comparison with Baselines
116
+
117
+ SDPose significantly outperforms traditional pose estimation models (e.g., Sapiens, ViTPose++) on out-of-distribution benchmarks while maintaining competitive performance on in-domain data.
118
+
119
+ See our [paper](https://arxiv.org/abs/2509.24980) for comprehensive evaluation results.
120
+
121
+ ## Citation
122
+
123
+ If you use SDPose in your research, please cite our paper:
124
+
125
+ ```bibtex
126
+ @misc{liang2025sdposeexploitingdiffusionpriors,
127
+ title={SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation},
128
+ author={Shuang Liang and Jing He and Chuanmeizhi Wang and Lejun Liao and Guo Zhang and Yingcong Chen and Yuan Yuan},
129
+ year={2025},
130
+ eprint={2509.24980},
131
+ archivePrefix={arXiv},
132
+ primaryClass={cs.CV},
133
+ url={https://arxiv.org/abs/2509.24980},
134
+ }
135
+ ```
136
+
137
+ ## License
138
+
139
+ This model is released under the [MIT License](https://opensource.org/licenses/MIT).
140
+
141
+ ## Additional Resources
142
+
143
+ - 🌐 **Project Website**: [https://t-s-liang.github.io/SDPose](https://t-s-liang.github.io/SDPose)
144
+ - 📄 **Paper**: [arXiv:2509.24980](https://arxiv.org/abs/2509.24980)
145
+ - 💻 **Code Repository**: [GitHub](https://github.com/t-s-liang/SDPose-OOD)
146
+ - 🤗 **Demo**: [HuggingFace Space](https://huggingface.co/spaces/teemosliang/SDPose-Body)
147
+ - 📧 **Contact**: tsliang2001@gmail.com
148
+
149
+ ---
150
+
151
+ <div align="center">
152
+
153
+ **⭐ Star us on GitHub — it motivates us a lot!**
154
+
155
+ </div>