File size: 3,148 Bytes
bef9ef9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
06ce56e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
license: apache-2.0
tags:
  - pytorch
  - keypoint-detection
  - human-pose-estimation
  - heatmap-regression
  - computer-vision
  - detr
  - coco
model-index:
  - name: detr-pose-coco50
    results:
      - task:
          type: pose-estimation
          name: Human Pose Estimation
        dataset:
          type: COCO
          name: COCO 2017 (50-person subset)
        metrics:
          - type: MSELoss
            value: ~0.02
            name: Heatmap MSE
---

# πŸ“Œ **DETR + Keypoint Estimation (COCO Subset)**
Author: [@Koushik](https://huggingface.co/Koushim)

---

### 🧠 Model Overview

This project combines:

* πŸ€– [facebook/detr-resnet-50](https://huggingface.co/facebook/detr-resnet-50) (object detector)
* 🧱 Custom PyTorch keypoint head
* πŸ“Š Trained on 500-person subset of [COCO 2017 Keypoints](https://cocodataset.org/#keypoints-2020)

The system detects people using DETR, then predicts 17 COCO-style keypoints (top-down) using heatmap regression.

---

### πŸ“‚ Files Included

| File                            | Description                                |
| ------------------------------- | ------------------------------------------ |
| `pytorch_model.bin`             | Trained PyTorch model weights              |
| `05_detr_pose_coco_colab.ipynb` | Full Colab notebook (training + inference) |
| `config.json`                   | Basic model metadata                       |
| `README.md`                     | Project description                        |

---

### πŸ“š Dataset

* **Subset**: 500 images from COCO val2017 with visible persons
* **Annotations**: 17 keypoints per person
* **Source**: [COCO Keypoints](https://cocodataset.org/#keypoints-2020)

---

### πŸ—οΈ Architecture

```text
[ Input Image ]
      β”‚
      β–Ό
[ DETR (Person BBox) ]
      β”‚
      β–Ό
[ Crop + Resize (256Γ—256) ]
      β”‚
      β–Ό
[ CNN Keypoint Head ]
      β”‚
      β–Ό
[ 17 Heatmaps (Keypoints) ]
```

---

### πŸš€ Quick Start

```python
import torch
from model import KeypointHead

model = KeypointHead()
model.load_state_dict(torch.load('pytorch_model.bin'))
model.eval()
```

---

### πŸ§ͺ Inference Demo

```python
from PIL import Image
import cv2, numpy as np
from transformers import DetrImageProcessor, DetrForObjectDetection

img = Image.open('sample.jpg')
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
detector = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50")

inputs = processor(images=img, return_tensors="pt")
outputs = detector(**inputs)
results = processor.post_process_object_detection(outputs, target_sizes=[img.size[::-1]], threshold=0.8)[0]

# Use results['boxes'][0] to crop person
# Feed crop into model(img) to get 17 heatmaps
```

---

### 🧠 Training (optional)

To fine-tune on your own dataset:

* Convert your data to COCO format
* Use the notebook provided (`05_detr_pose_coco_colab.ipynb`)
* Change paths and re-train

---

### ✨ Credit

* [Hugging Face Transformers](https://github.com/huggingface/transformers)
* [COCO Dataset](https://cocodataset.org/)
* [facebook/detr](https://huggingface.co/facebook/detr-resnet-50)