--- license: apache-2.0 tags: - pytorch - keypoint-detection - human-pose-estimation - heatmap-regression - computer-vision - detr - coco model-index: - name: detr-pose-coco50 results: - task: type: pose-estimation name: Human Pose Estimation dataset: type: COCO name: COCO 2017 (50-person subset) metrics: - type: MSELoss value: ~0.02 name: Heatmap MSE --- # ๐Ÿ“Œ **DETR + Keypoint Estimation (COCO Subset)** Author: [@Koushik](https://huggingface.co/Koushim) --- ### ๐Ÿง  Model Overview This project combines: * ๐Ÿค– [facebook/detr-resnet-50](https://huggingface.co/facebook/detr-resnet-50) (object detector) * ๐Ÿงฑ Custom PyTorch keypoint head * ๐Ÿ“Š Trained on 500-person subset of [COCO 2017 Keypoints](https://cocodataset.org/#keypoints-2020) The system detects people using DETR, then predicts 17 COCO-style keypoints (top-down) using heatmap regression. --- ### ๐Ÿ“‚ Files Included | File | Description | | ------------------------------- | ------------------------------------------ | | `pytorch_model.bin` | Trained PyTorch model weights | | `05_detr_pose_coco_colab.ipynb` | Full Colab notebook (training + inference) | | `config.json` | Basic model metadata | | `README.md` | Project description | --- ### ๐Ÿ“š Dataset * **Subset**: 500 images from COCO val2017 with visible persons * **Annotations**: 17 keypoints per person * **Source**: [COCO Keypoints](https://cocodataset.org/#keypoints-2020) --- ### ๐Ÿ—๏ธ Architecture ```text [ Input Image ] โ”‚ โ–ผ [ DETR (Person BBox) ] โ”‚ โ–ผ [ Crop + Resize (256ร—256) ] โ”‚ โ–ผ [ CNN Keypoint Head ] โ”‚ โ–ผ [ 17 Heatmaps (Keypoints) ] ``` --- ### ๐Ÿš€ Quick Start ```python import torch from model import KeypointHead model = KeypointHead() model.load_state_dict(torch.load('pytorch_model.bin')) model.eval() ``` --- ### ๐Ÿงช Inference Demo ```python from PIL import Image import cv2, numpy as np from transformers import DetrImageProcessor, DetrForObjectDetection img = Image.open('sample.jpg') processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50") detector = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50") inputs = processor(images=img, return_tensors="pt") outputs = detector(**inputs) results = processor.post_process_object_detection(outputs, target_sizes=[img.size[::-1]], threshold=0.8)[0] # Use results['boxes'][0] to crop person # Feed crop into model(img) to get 17 heatmaps ``` --- ### ๐Ÿง  Training (optional) To fine-tune on your own dataset: * Convert your data to COCO format * Use the notebook provided (`05_detr_pose_coco_colab.ipynb`) * Change paths and re-train --- ### โœจ Credit * [Hugging Face Transformers](https://github.com/huggingface/transformers) * [COCO Dataset](https://cocodataset.org/) * [facebook/detr](https://huggingface.co/facebook/detr-resnet-50)