File size: 6,185 Bytes
f9bd472 79338f9 f9bd472 79338f9 f9bd472 79338f9 f9bd472 d1a872e f9bd472 79338f9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
---
license: gpl-3.0
tags:
- human-pose-estimation
- pose-estimation
- instance-segmentation
- detection
- person-detection
- computer-vision
datasets:
- COCO
- AIC
- MPII
- OCHuman
metrics:
- mAP
pipeline_tag: keypoint-detection
---
</h1><div id="toc">
<ul align="center" style="list-style: none; padding: 0; margin: 0;">
<summary>
<h1 style="margin-bottom: 0.0em;">
Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle
</h1>
</summary>
</ul>
</div>
</h1><div id="toc">
<ul align="center" style="list-style: none; padding: 0; margin: 0;">
<summary>
<h2 style="margin-bottom: 0.2em;">
ICCV 2025
</h2>
</summary>
</ul>
</div>
<div style="text-align: justify;">
The BBox-Mask-Pose (BMP) method integrates detection, pose estimation, and segmentation into a self-improving loop by conditioning these tasks on each other.
This approach enhances all three tasks simultaneously.
Using segmentation masks instead of bounding boxes improves performance in crowded scenarios, making top-down methods competitive with bottom-up approaches.
Key contributions:
1. **MaskPose**: a pose estimation model conditioned by segmentation masks instead of bounding boxes, boosting performance in dense scenes without adding parameters
- Download pre-trained weights below
2. **BBox-MaskPose (BMP)**: method linking bounding boxes, segmentation masks, and poses to simultaneously address multi-body detection, segmentation and pose estimation
- Try the demo!
3. Fine-tuned RTMDet adapted for itterative detection (ignoring 'holes')
- Download pre-trained weights below
5. Support for multi-dataset training of ViTPose, previously implemented in the official ViTPose repository but absent in MMPose.
</div>
<div align="left">
[](https://arxiv.org/abs/2412.01562)
[](https://github.com/MiraPurkrabek/BBoxMaskPose)
[](https://mirapurkrabek.github.io/BBox-Mask-Pose/)
</div>
For more details, see the [GitHub repository](https://github.com/MiraPurkrabek/BBoxMaskPose).
## ๐ Models List
1. **ViTPose-b multi-dataset**
2. **MaskPose-b**
3. fine-tuned **RTMDet-l**
See details of each model below.
-----------------------------------------
## 1. ViTPose-B [multi-dataset]
- **Model type**: ViT-b backbone with multi-layer decoder
- **Input**: RGB images (192x256)
- **Output**: Keypoints Coordinates (48x64 heatmap for each keypoint, 21 keypoints)
- **Language(s)**: Not language-dependent (vision model)
- **License**: GPL-3.0
- **Framework**: MMPose
#### Training Details
- **Training data**: [COCO Dataset](https://cocodataset.org/#home), [MPII Dataset](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/software-and-datasets/mpii-human-pose-dataset), [AIC Datasel](https://arxiv.org/abs/1711.06475)
- **Training script**: [GitHub - BBoxMaskPose_code](https://github.com/MiraPurkrabek/BBoxMaskPose)
- **Epochs**: 210
- **Batch size**: 64
- **Learning rate**: 5e-5
- **Hardware**: 4x NVIDIA A-100
**What's new?**
ViTPose trained on multiple datasets perform much better in multi-body (and crowded) scenarios than COCO-trained ViTPose.
The model was trained in multi-dataset setup by authors before, this is reproduction compatible with MMPose 2.0.
-----------------------------------------
## 2. MaskPose-B
- **Model type**: ViT-b backbone with multi-layer decoder
- **Input**: RGB images (192x256) + estimated instance segmentation
- **Output**: Keypoints Coordinates (48x64 heatmap for each keypoint, 21 keypoints)
- **Language(s)**: Not language-dependent (vision model)
- **License**: GPL-3.0
- **Framework**: MMPose
#### Training Details
- **Training data**: [COCO Dataset](https://cocodataset.org/#home), [MPII Dataset](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/software-and-datasets/mpii-human-pose-dataset), [AIC Datasel](https://arxiv.org/abs/1711.06475) + SAM-estimated instance masks
- **Training script**: [GitHub - BBoxMaskPose_code](https://github.com/MiraPurkrabek/BBoxMaskPose)
- **Epochs**: 210
- **Batch size**: 64
- **Learning rate**: 5e-5
- **Hardware**: 4x NVIDIA A-100
**What's new?**
Compared to ViTPose, MaskPose takes instance segmentation as an input and is even better in distinguishing instances in muli-body scenes.
No computational overhead compared to ViTPose.
-----------------------------------------
## 3. fine-tuned RTMDet-L
- **Model type**: CSPNeXt-P5 backbone, CSPNeXtPAFPN neck, RTMDetInsSepBN head
- **Input**: RGB images
- **Output**: Detected instances -- bbox, instance mask and class for each
- **Language(s)**: Not language-dependent (vision model)
- **License**: GPL-3.0
- **Framework**: MMDetection
#### Training Details
- **Training data**: [COCO Dataset](https://cocodataset.org/#home) with randomly masked-out instances
- **Training script**: [GitHub - BBoxMaskPose_code](https://github.com/MiraPurkrabek/BBoxMaskPose)
- **Epochs**: 10
- **Batch size**: 16
- **Learning rate**: 2e-2
- **Hardware**: 4x NVIDIA A-100
**What's new?**
RTMDet fine-tuned to ignore masked-out instances is designed for itterative detection.
Especially effective in multi-body scenes where background would not be detected otherwise.
## ๐ Citation
If you use our work, please cite:
```bibtex
@InProceedings{Purkrabek2025ICCV,
author={Purkrabek, Miroslav and Matas, Jiri},
title={Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2025},
month={October},
}
```
## ๐งโ๐ป Authors
- Miroslav Purkrabek ([personal website](https://github.com/MiraPurkrabek))
- Jiri Matas ([personal website](https://cmp.felk.cvut.cz/~matas/)) |