File size: 3,148 Bytes
bef9ef9 06ce56e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
---
license: apache-2.0
tags:
- pytorch
- keypoint-detection
- human-pose-estimation
- heatmap-regression
- computer-vision
- detr
- coco
model-index:
- name: detr-pose-coco50
results:
- task:
type: pose-estimation
name: Human Pose Estimation
dataset:
type: COCO
name: COCO 2017 (50-person subset)
metrics:
- type: MSELoss
value: ~0.02
name: Heatmap MSE
---
# π **DETR + Keypoint Estimation (COCO Subset)**
Author: [@Koushik](https://huggingface.co/Koushim)
---
### π§ Model Overview
This project combines:
* π€ [facebook/detr-resnet-50](https://huggingface.co/facebook/detr-resnet-50) (object detector)
* π§± Custom PyTorch keypoint head
* π Trained on 500-person subset of [COCO 2017 Keypoints](https://cocodataset.org/#keypoints-2020)
The system detects people using DETR, then predicts 17 COCO-style keypoints (top-down) using heatmap regression.
---
### π Files Included
| File | Description |
| ------------------------------- | ------------------------------------------ |
| `pytorch_model.bin` | Trained PyTorch model weights |
| `05_detr_pose_coco_colab.ipynb` | Full Colab notebook (training + inference) |
| `config.json` | Basic model metadata |
| `README.md` | Project description |
---
### π Dataset
* **Subset**: 500 images from COCO val2017 with visible persons
* **Annotations**: 17 keypoints per person
* **Source**: [COCO Keypoints](https://cocodataset.org/#keypoints-2020)
---
### ποΈ Architecture
```text
[ Input Image ]
β
βΌ
[ DETR (Person BBox) ]
β
βΌ
[ Crop + Resize (256Γ256) ]
β
βΌ
[ CNN Keypoint Head ]
β
βΌ
[ 17 Heatmaps (Keypoints) ]
```
---
### π Quick Start
```python
import torch
from model import KeypointHead
model = KeypointHead()
model.load_state_dict(torch.load('pytorch_model.bin'))
model.eval()
```
---
### π§ͺ Inference Demo
```python
from PIL import Image
import cv2, numpy as np
from transformers import DetrImageProcessor, DetrForObjectDetection
img = Image.open('sample.jpg')
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
detector = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50")
inputs = processor(images=img, return_tensors="pt")
outputs = detector(**inputs)
results = processor.post_process_object_detection(outputs, target_sizes=[img.size[::-1]], threshold=0.8)[0]
# Use results['boxes'][0] to crop person
# Feed crop into model(img) to get 17 heatmaps
```
---
### π§ Training (optional)
To fine-tune on your own dataset:
* Convert your data to COCO format
* Use the notebook provided (`05_detr_pose_coco_colab.ipynb`)
* Change paths and re-train
---
### β¨ Credit
* [Hugging Face Transformers](https://github.com/huggingface/transformers)
* [COCO Dataset](https://cocodataset.org/)
* [facebook/detr](https://huggingface.co/facebook/detr-resnet-50)
|