File size: 3,148 Bytes

---
license: apache-2.0
tags:
  - pytorch
  - keypoint-detection
  - human-pose-estimation
  - heatmap-regression
  - computer-vision
  - detr
  - coco
model-index:
  - name: detr-pose-coco50
    results:
      - task:
          type: pose-estimation
          name: Human Pose Estimation
        dataset:
          type: COCO
          name: COCO 2017 (50-person subset)
        metrics:
          - type: MSELoss
            value: ~0.02
            name: Heatmap MSE
---

# 📌 **DETR + Keypoint Estimation (COCO Subset)**
Author: [@Koushik](https://huggingface.co/Koushim)

---

### 🧠 Model Overview

This project combines:

* 🤖 [facebook/detr-resnet-50](https://huggingface.co/facebook/detr-resnet-50) (object detector)
* 🧱 Custom PyTorch keypoint head
* 📊 Trained on 500-person subset of [COCO 2017 Keypoints](https://cocodataset.org/#keypoints-2020)

The system detects people using DETR, then predicts 17 COCO-style keypoints (top-down) using heatmap regression.

---

### 📂 Files Included

| File                            | Description                                |
| ------------------------------- | ------------------------------------------ |
| `pytorch_model.bin`             | Trained PyTorch model weights              |
| `05_detr_pose_coco_colab.ipynb` | Full Colab notebook (training + inference) |
| `config.json`                   | Basic model metadata                       |
| `README.md`                     | Project description                        |

---

### 📚 Dataset

* **Subset**: 500 images from COCO val2017 with visible persons
* **Annotations**: 17 keypoints per person
* **Source**: [COCO Keypoints](https://cocodataset.org/#keypoints-2020)

---

### 🏗️ Architecture

```text
[ Input Image ]
      │
      ▼
[ DETR (Person BBox) ]
      │
      ▼
[ Crop + Resize (256×256) ]
      │
      ▼
[ CNN Keypoint Head ]
      │
      ▼
[ 17 Heatmaps (Keypoints) ]
```

---

### 🚀 Quick Start

```python
import torch
from model import KeypointHead

model = KeypointHead()
model.load_state_dict(torch.load('pytorch_model.bin'))
model.eval()
```

---

### 🧪 Inference Demo

```python
from PIL import Image
import cv2, numpy as np
from transformers import DetrImageProcessor, DetrForObjectDetection

img = Image.open('sample.jpg')
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
detector = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50")

inputs = processor(images=img, return_tensors="pt")
outputs = detector(**inputs)
results = processor.post_process_object_detection(outputs, target_sizes=[img.size[::-1]], threshold=0.8)[0]

# Use results['boxes'][0] to crop person
# Feed crop into model(img) to get 17 heatmaps
```

---

### 🧠 Training (optional)

To fine-tune on your own dataset:

* Convert your data to COCO format
* Use the notebook provided (`05_detr_pose_coco_colab.ipynb`)
* Change paths and re-train

---

### ✨ Credit

* [Hugging Face Transformers](https://github.com/huggingface/transformers)
* [COCO Dataset](https://cocodataset.org/)
* [facebook/detr](https://huggingface.co/facebook/detr-resnet-50)