purkrmir commited on
Commit
53260fc
·
verified ·
1 Parent(s): 231a5a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -13
README.md CHANGED
@@ -30,12 +30,14 @@ library_name: bboxmaskpose
30
  <ul align="center" style="list-style: none; padding: 0; margin: 0;">
31
  <summary>
32
  <h2 style="margin-bottom: 0.2em;">
33
- ICCV 2025
34
  </h2>
35
  </summary>
36
  </ul>
37
  </div>
38
 
 
 
39
  <div style="text-align: justify;">
40
  The BBox-Mask-Pose (BMP) method integrates detection, pose estimation, and segmentation into a self-improving loop by conditioning these tasks on each other.
41
  This approach enhances all three tasks simultaneously.
@@ -44,9 +46,10 @@ Using segmentation masks instead of bounding boxes improves performance in crowd
44
  Key contributions:
45
  1. **MaskPose**: a pose estimation model conditioned by segmentation masks instead of bounding boxes, boosting performance in dense scenes without adding parameters
46
  - Download pre-trained weights below
47
- 2. **BBox-MaskPose (BMP)**: method linking bounding boxes, segmentation masks, and poses to simultaneously address multi-body detection, segmentation and pose estimation
 
48
  - Try the demo!
49
- 3. Fine-tuned RTMDet adapted for itterative detection (ignoring 'holes')
50
  - Download pre-trained weights below
51
  5. Support for multi-dataset training of ViTPose, previously implemented in the official ViTPose repository but absent in MMPose.
52
  </div>
@@ -64,8 +67,9 @@ For more details, see the [GitHub repository](https://github.com/MiraPurkrabek/B
64
  ## 📝 Models List
65
 
66
  1. **ViTPose-b multi-dataset**
67
- 2. **MaskPose-b**
68
- 3. fine-tuned **RTMDet-l**
 
69
 
70
  See details of each model below.
71
 
@@ -93,14 +97,15 @@ ViTPose trained on multiple datasets perform much better in multi-body (and crow
93
  The model was trained in multi-dataset setup by authors before, this is reproduction compatible with MMPose 2.0.
94
 
95
  -----------------------------------------
96
- ## 2. MaskPose-B
97
 
98
  - **Model type**: ViT-b backbone with multi-layer decoder
99
  - **Input**: RGB images (192x256) + estimated instance segmentation
100
- - **Output**: Keypoints Coordinates (48x64 heatmap for each keypoint, 21 keypoints)
101
  - **Language(s)**: Not language-dependent (vision model)
102
  - **License**: GPL-3.0
103
  - **Framework**: MMPose
 
104
 
105
  #### Training Details
106
 
@@ -115,8 +120,36 @@ The model was trained in multi-dataset setup by authors before, this is reproduc
115
  Compared to ViTPose, MaskPose takes instance segmentation as an input and is even better in distinguishing instances in muli-body scenes.
116
  No computational overhead compared to ViTPose.
117
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
  -----------------------------------------
119
- ## 3. fine-tuned RTMDet-L
120
 
121
  - **Model type**: CSPNeXt-P5 backbone, CSPNeXtPAFPN neck, RTMDetInsSepBN head
122
  - **Input**: RGB images
@@ -143,17 +176,34 @@ Especially effective in multi-body scenes where background would not be detected
143
 
144
  If you use our work, please cite:
145
 
 
 
 
 
 
 
 
 
146
  ```bibtex
147
  @InProceedings{Purkrabek2025ICCV,
148
- author={Purkrabek, Miroslav and Matas, Jiri},
149
- title={Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle},
150
- booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
151
- year={2025},
152
- month={October},
 
 
 
 
 
 
 
 
153
  }
154
  ```
155
 
156
  ## 🧑‍💻 Authors
157
 
158
  - Miroslav Purkrabek ([personal website](https://github.com/MiraPurkrabek))
 
159
  - Jiri Matas ([personal website](https://cmp.felk.cvut.cz/~matas/))
 
30
  <ul align="center" style="list-style: none; padding: 0; margin: 0;">
31
  <summary>
32
  <h2 style="margin-bottom: 0.2em;">
33
+ ICCV 2025 + CVPR 2025
34
  </h2>
35
  </summary>
36
  </ul>
37
  </div>
38
 
39
+ ![image](https://cdn-uploads.huggingface.co/production/uploads/64bfa064b7375f6b84ad58e9/7wuB6cVsvjWsPf7B56TE4.png)
40
+
41
  <div style="text-align: justify;">
42
  The BBox-Mask-Pose (BMP) method integrates detection, pose estimation, and segmentation into a self-improving loop by conditioning these tasks on each other.
43
  This approach enhances all three tasks simultaneously.
 
46
  Key contributions:
47
  1. **MaskPose**: a pose estimation model conditioned by segmentation masks instead of bounding boxes, boosting performance in dense scenes without adding parameters
48
  - Download pre-trained weights below
49
+ 2. **PMPose**: a pose estimation model conditioned by segmentation masks AND predicting full description of each keypoint. Combination of MaskPose and ProbPose (CVPR'25).
50
+ 3. **BBox-MaskPose (BMP)**: method linking bounding boxes, segmentation masks, and poses to simultaneously address multi-body detection, segmentation and pose estimation
51
  - Try the demo!
52
+ 4. Fine-tuned RTMDet adapted for itterative detection (ignoring 'holes')
53
  - Download pre-trained weights below
54
  5. Support for multi-dataset training of ViTPose, previously implemented in the official ViTPose repository but absent in MMPose.
55
  </div>
 
67
  ## 📝 Models List
68
 
69
  1. **ViTPose-b multi-dataset**
70
+ 2. **MaskPose**
71
+ 3. **PMPose**
72
+ 4. fine-tuned **RTMDet-l**
73
 
74
  See details of each model below.
75
 
 
97
  The model was trained in multi-dataset setup by authors before, this is reproduction compatible with MMPose 2.0.
98
 
99
  -----------------------------------------
100
+ ## 2. MaskPose-1.1.0
101
 
102
  - **Model type**: ViT-b backbone with multi-layer decoder
103
  - **Input**: RGB images (192x256) + estimated instance segmentation
104
+ - **Output**: Keypoints Coordinates (48x64 heatmap for each keypoint, 23 keypoints)
105
  - **Language(s)**: Not language-dependent (vision model)
106
  - **License**: GPL-3.0
107
  - **Framework**: MMPose
108
+ - **Size(s)**: -S, -B, -L, -H
109
 
110
  #### Training Details
111
 
 
120
  Compared to ViTPose, MaskPose takes instance segmentation as an input and is even better in distinguishing instances in muli-body scenes.
121
  No computational overhead compared to ViTPose.
122
 
123
+ **V1.0.0 vs. V1.1.0**
124
+ The previous version (v1.0.0) predicted 21 keypoints and was trained using a different training recipe. V1.1.0 predicts 23 keypoints and improved training recipe with dataset balancing, which improves numbers.
125
+
126
+ -----------------------------------------
127
+ ## 3. PMPose-1.0.0
128
+
129
+ - **Model type**: ViT-b backbone with multi-layer decoder
130
+ - **Input**: RGB images (192x256) + estimated instance segmentation
131
+ - **Output**: Keypoints Coordinates (48x64 probmap for each keypoint, 23 keypoints), Presence Probabilities, Visibilities, Expected OKS for each keypoint
132
+ - **Language(s)**: Not language-dependent (vision model)
133
+ - **License**: GPL-3.0
134
+ - **Framework**: MMPose
135
+ - **Size(s)**: -S, -B, -L, -H
136
+
137
+ #### Training Details
138
+
139
+ - **Training data**: [COCO Dataset](https://cocodataset.org/#home), [MPII Dataset](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/software-and-datasets/mpii-human-pose-dataset), [AIC Datasel](https://arxiv.org/abs/1711.06475) + SAM-estimated instance masks
140
+ - **Training script**: [GitHub - BBoxMaskPose_code](https://github.com/MiraPurkrabek/BBoxMaskPose)
141
+ - **Epochs**: 20
142
+ - **Batch size**: 64
143
+ - **Learning rate**: 5e-5
144
+ - **Frozen backbone**
145
+ - **Hardware**: 4x NVIDIA A-100
146
+
147
+
148
+ **What's new?**
149
+ PMPose combines MaskPose-1.1.0 and [ProbPose (CVPR'25)](https://mirapurkrabek.github.io/ProbPose/). It is conditioned by masks and has superior in-crowd performance as MaskPose and also precicts proabilities and visibilities as ProbPose.
150
+
151
  -----------------------------------------
152
+ ## 4. fine-tuned RTMDet-L
153
 
154
  - **Model type**: CSPNeXt-P5 backbone, CSPNeXtPAFPN neck, RTMDetInsSepBN head
155
  - **Input**: RGB images
 
176
 
177
  If you use our work, please cite:
178
 
179
+ ```bibtex
180
+ @InProceedings{BMPv2,
181
+ author = {Purkrabek, Miroslav and Kolomiiets, Constantin and Matas, Jiri},
182
+ title = {BBoxMaskPose v2: Expanding Mutual Conditioning to 3D},
183
+ booktitle = {arXiv preprint arXiv:to be added},
184
+ year = {2026}
185
+ }
186
+ ```
187
  ```bibtex
188
  @InProceedings{Purkrabek2025ICCV,
189
+ author = {Purkrabek, Miroslav and Matas, Jiri},
190
+ title = {Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle},
191
+ booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
192
+ month = {October},
193
+ year = {2025}
194
+ }
195
+ ```
196
+ ```bibtex
197
+ @InProceedings{Kolomiiets2026CVWW,
198
+ author = {Kolomiiets, Constantin and Purkrabek, Miroslav and Matas, Jiri},
199
+ title = {SAM-pose2seg: Pose-Guided Human Instance Segmentation in Crowds},
200
+ booktitle = {Computer Vision Winter Workshop (CVWW)},
201
+ year = {2026}
202
  }
203
  ```
204
 
205
  ## 🧑‍💻 Authors
206
 
207
  - Miroslav Purkrabek ([personal website](https://github.com/MiraPurkrabek))
208
+ - Constantin Kolomiiets
209
  - Jiri Matas ([personal website](https://cmp.felk.cvut.cz/~matas/))