Update README.md

Browse files

Files changed (1) hide show

README.md +155 -3

README.md CHANGED Viewed

@@ -1,3 +1,155 @@
----
-license: mit
----

+---
+language: en
+license: mit
+tags:
+- pose-estimation
+- computer-vision
+- keypoint-detection
+- diffusion-models
+- stable-diffusion
+- out-of-distribution
+- human-pose
+- top-down-pose-estimation
+- coco
+- mmpose
+library_name: pytorch
+---
+# SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation (Body - 17 Keypoints)
+<div align="center">
+[![Paper](https://img.shields.io/badge/arXiv-Paper-b31b1b?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2509.24980)
+[![Project Page](https://img.shields.io/badge/Project-Website-pink?logo=googlechrome&logoColor=white)](https://t-s-liang.github.io/SDPose)
+[![HuggingFace Demo](https://img.shields.io/badge/🤗%20HuggingFace-Demo-yellow)](https://huggingface.co/spaces/teemosliang/SDPose-Body)
+[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
+</div>
+## Model Description
+**SDPose** is a state-of-the-art human pose estimation model that leverages the powerful visual priors from **Stable Diffusion** to achieve exceptional performance on out-of-distribution (OOD) scenarios. This model variant estimates **17 COCO body keypoints** including nose, eyes, ears, shoulders, elbows, wrists, hips, knees, and ankles.
+### Model Architecture
+SDPose employs a **U-Net backbone** initialized with Stable Diffusion v2 weights, combined with a specialized heatmap head for keypoint prediction. The model operates in a top-down manner:
+1. **Person Detection**: Detect human bounding boxes using an object detector (e.g., YOLO11-x)
+2. **Pose Estimation**: Crop and estimate 17 body keypoints for each detected person
+3. **Heatmap Generation**: Produce confidence heatmaps for precise keypoint estimation
+**Model Specifications:**
+- **Backbone**: Stable Diffusion v2 U-Net (fine-tuned; minimal architectural changes)
+- **Head**: Custom heatmap prediction head
+- **Input Resolution**: 1024×768 (H×W)
+- **Output**: 17 keypoint heatmaps + coordinates with confidence scores
+- **Framework**: MMPose
+## Supported Keypoints (COCO Format)
+The model predicts 17 body keypoints following the COCO keypoint format:
+```
+0: nose
+1: left_eye
+2: right_eye
+3: left_ear
+4: right_ear
+5: left_shoulder
+6: right_shoulder
+7: left_elbow
+8: right_elbow
+9: left_wrist
+10: right_wrist
+11: left_hip
+12: right_hip
+13: left_knee
+14: right_knee
+15: left_ankle
+16: right_ankle
+```
+## Intended Use
+### Primary Use Cases
+- Human pose estimation in natural images
+- Pose estimation in artistic and stylized domains (paintings, anime, sketches)
+- Animation and video pose tracking
+- Cross-domain pose analysis and research
+- Applications requiring robust pose estimation under distribution shifts
+## How to Use
+### Installation
+```bash
+# Clone the repository
+git clone https://github.com/t-s-liang/SDPose-OOD.git
+cd SDPose-OOD
+# Install dependencies
+pip install -r requirements.txt
+# Download YOLO11-x for human detection
+wget https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11x.pt -P models/
+# Launch Gradio interface
+cd gradio_app
+bash launch_gradio.sh
+```
+## Training Data
+### Datasets
+Trained exclusively on COCO-2017 train2017 (no extra data).
+- **COCO (Common Objects in Context)**: 200K+ images with 17 body keypoints
+### Preprocessing
+- Images are resized and cropped to 1024×768 resolution
+- Augmentation: random horizontal flip, half-body & bbox transforms, UDP affine; Albumentations (Gaussian/Median blur, coarse dropout).
+- Heatmaps: UDP codec (MMPose style).
+### Comparison with Baselines
+SDPose significantly outperforms traditional pose estimation models (e.g., Sapiens, ViTPose++) on out-of-distribution benchmarks while maintaining competitive performance on in-domain data.
+See our [paper](https://arxiv.org/abs/2509.24980) for comprehensive evaluation results.
+## Citation
+If you use SDPose in your research, please cite our paper:
+```bibtex
+@misc{liang2025sdposeexploitingdiffusionpriors,
+      title={SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation},
+      author={Shuang Liang and Jing He and Chuanmeizhi Wang and Lejun Liao and Guo Zhang and Yingcong Chen and Yuan Yuan},
+      year={2025},
+      eprint={2509.24980},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2509.24980},
+}
+```
+## License
+This model is released under the [MIT License](https://opensource.org/licenses/MIT).
+## Additional Resources
+- 🌐 **Project Website**: [https://t-s-liang.github.io/SDPose](https://t-s-liang.github.io/SDPose)
+- 📄 **Paper**: [arXiv:2509.24980](https://arxiv.org/abs/2509.24980)
+- 💻 **Code Repository**: [GitHub](https://github.com/t-s-liang/SDPose-OOD)
+- 🤗 **Demo**: [HuggingFace Space](https://huggingface.co/spaces/teemosliang/SDPose-Body)
+- 📧 **Contact**: tsliang2001@gmail.com
+---
+<div align="center">
+**⭐ Star us on GitHub — it motivates us a lot!**
+</div>