vrg-prague
/

BBoxMaskPose

@@ -30,12 +30,14 @@ library_name: bboxmaskpose
   <ul align="center" style="list-style: none; padding: 0; margin: 0;">
     <summary>
       <h2 style="margin-bottom: 0.2em;">
-        ICCV 2025
       </h2>
     </summary>
   </ul>
 </div>
 <div style="text-align: justify;">
 The BBox-Mask-Pose (BMP) method integrates detection, pose estimation, and segmentation into a self-improving loop by conditioning these tasks on each other.
 This approach enhances all three tasks simultaneously.
@@ -44,9 +46,10 @@ Using segmentation masks instead of bounding boxes improves performance in crowd
 Key contributions:
 1. **MaskPose**: a pose estimation model conditioned by segmentation masks instead of bounding boxes, boosting performance in dense scenes without adding parameters
     - Download pre-trained weights below
-2. **BBox-MaskPose (BMP)**: method linking bounding boxes, segmentation masks, and poses to simultaneously address multi-body detection, segmentation and pose estimation
     - Try the demo!
-3. Fine-tuned RTMDet adapted for itterative detection (ignoring 'holes')
     - Download pre-trained weights below
 5. Support for multi-dataset training of ViTPose, previously implemented in the official ViTPose repository but absent in MMPose.
 </div>
@@ -64,8 +67,9 @@ For more details, see the [GitHub repository](https://github.com/MiraPurkrabek/B
 ## 📝 Models List
 1. **ViTPose-b multi-dataset**
-2. **MaskPose-b**
-3. fine-tuned **RTMDet-l**
 See details of each model below.
@@ -93,14 +97,15 @@ ViTPose trained on multiple datasets perform much better in multi-body (and crow
 The model was trained in multi-dataset setup by authors before, this is reproduction compatible with MMPose 2.0.
 -----------------------------------------
-## 2. MaskPose-B
 - **Model type**: ViT-b backbone with multi-layer decoder
 - **Input**: RGB images (192x256) + estimated instance segmentation
-- **Output**: Keypoints Coordinates (48x64 heatmap for each keypoint, 21 keypoints)
 - **Language(s)**: Not language-dependent (vision model)
 - **License**: GPL-3.0
 - **Framework**: MMPose
 #### Training Details
@@ -115,8 +120,36 @@ The model was trained in multi-dataset setup by authors before, this is reproduc
 Compared to ViTPose, MaskPose takes instance segmentation as an input and is even better in distinguishing instances in muli-body scenes.
 No computational overhead compared to ViTPose.
 -----------------------------------------
-## 3. fine-tuned RTMDet-L
 - **Model type**: CSPNeXt-P5 backbone, CSPNeXtPAFPN neck, RTMDetInsSepBN head
 - **Input**: RGB images
@@ -143,17 +176,34 @@ Especially effective in multi-body scenes where background would not be detected
 If you use our work, please cite:
 ```bibtex
 @InProceedings{Purkrabek2025ICCV,
-  author={Purkrabek, Miroslav and Matas, Jiri},
-  title={Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle},
-  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
-  year={2025},
-  month={October},
 }
 ```
 ## 🧑‍💻 Authors
 - Miroslav Purkrabek ([personal website](https://github.com/MiraPurkrabek))
 - Jiri Matas ([personal website](https://cmp.felk.cvut.cz/~matas/))

   <ul align="center" style="list-style: none; padding: 0; margin: 0;">
     <summary>
       <h2 style="margin-bottom: 0.2em;">
+        ICCV 2025 + CVPR 2025
       </h2>
     </summary>
   </ul>
 </div>
+![image](https://cdn-uploads.huggingface.co/production/uploads/64bfa064b7375f6b84ad58e9/7wuB6cVsvjWsPf7B56TE4.png)
 <div style="text-align: justify;">
 The BBox-Mask-Pose (BMP) method integrates detection, pose estimation, and segmentation into a self-improving loop by conditioning these tasks on each other.
 This approach enhances all three tasks simultaneously.
 Key contributions:
 1. **MaskPose**: a pose estimation model conditioned by segmentation masks instead of bounding boxes, boosting performance in dense scenes without adding parameters
     - Download pre-trained weights below
+2. **PMPose**: a pose estimation model conditioned by segmentation masks AND predicting full description of each keypoint. Combination of MaskPose and ProbPose (CVPR'25).
+3. **BBox-MaskPose (BMP)**: method linking bounding boxes, segmentation masks, and poses to simultaneously address multi-body detection, segmentation and pose estimation
     - Try the demo!
+4. Fine-tuned RTMDet adapted for itterative detection (ignoring 'holes')
     - Download pre-trained weights below
 5. Support for multi-dataset training of ViTPose, previously implemented in the official ViTPose repository but absent in MMPose.
 </div>
 ## 📝 Models List
 1. **ViTPose-b multi-dataset**
+2. **MaskPose**
+3. **PMPose**
+4. fine-tuned **RTMDet-l**
 See details of each model below.
 The model was trained in multi-dataset setup by authors before, this is reproduction compatible with MMPose 2.0.
 -----------------------------------------
+## 2. MaskPose-1.1.0
 - **Model type**: ViT-b backbone with multi-layer decoder
 - **Input**: RGB images (192x256) + estimated instance segmentation
+- **Output**: Keypoints Coordinates (48x64 heatmap for each keypoint, 23 keypoints)
 - **Language(s)**: Not language-dependent (vision model)
 - **License**: GPL-3.0
 - **Framework**: MMPose
+- **Size(s)**: -S, -B, -L, -H
 #### Training Details
 Compared to ViTPose, MaskPose takes instance segmentation as an input and is even better in distinguishing instances in muli-body scenes.
 No computational overhead compared to ViTPose.
+**V1.0.0 vs. V1.1.0**
+The previous version (v1.0.0) predicted 21 keypoints and was trained using a different training recipe. V1.1.0 predicts 23 keypoints and improved training recipe with dataset balancing, which improves numbers.
+-----------------------------------------
+## 3. PMPose-1.0.0
+- **Model type**: ViT-b backbone with multi-layer decoder
+- **Input**: RGB images (192x256) + estimated instance segmentation
+- **Output**: Keypoints Coordinates (48x64 probmap for each keypoint, 23 keypoints), Presence Probabilities, Visibilities, Expected OKS for each keypoint
+- **Language(s)**: Not language-dependent (vision model)
+- **License**: GPL-3.0
+- **Framework**: MMPose
+- **Size(s)**: -S, -B, -L, -H
+#### Training Details
+- **Training data**: [COCO Dataset](https://cocodataset.org/#home), [MPII Dataset](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/software-and-datasets/mpii-human-pose-dataset), [AIC Datasel](https://arxiv.org/abs/1711.06475) + SAM-estimated instance masks
+- **Training script**: [GitHub - BBoxMaskPose_code](https://github.com/MiraPurkrabek/BBoxMaskPose)
+- **Epochs**: 20
+- **Batch size**: 64
+- **Learning rate**: 5e-5
+- **Frozen backbone**
+- **Hardware**: 4x NVIDIA A-100
+**What's new?**
+PMPose combines MaskPose-1.1.0 and [ProbPose (CVPR'25)](https://mirapurkrabek.github.io/ProbPose/). It is conditioned by masks and has superior in-crowd performance as MaskPose and also precicts proabilities and visibilities as ProbPose.
 -----------------------------------------
+## 4. fine-tuned RTMDet-L
 - **Model type**: CSPNeXt-P5 backbone, CSPNeXtPAFPN neck, RTMDetInsSepBN head
 - **Input**: RGB images
 If you use our work, please cite:
+```bibtex
+@InProceedings{BMPv2,
+    author    = {Purkrabek, Miroslav and Kolomiiets, Constantin and Matas, Jiri},
+    title     = {BBoxMaskPose v2: Expanding Mutual Conditioning to 3D},
+    booktitle = {arXiv preprint arXiv:to be added},
+    year      = {2026}
+}
+```
 ```bibtex
 @InProceedings{Purkrabek2025ICCV,
+    author    = {Purkrabek, Miroslav and Matas, Jiri},
+    title     = {Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle},
+    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
+    month     = {October},
+    year      = {2025}
+}
+```
+```bibtex
+@InProceedings{Kolomiiets2026CVWW,
+    author    = {Kolomiiets, Constantin and Purkrabek, Miroslav and Matas, Jiri},
+    title     = {SAM-pose2seg: Pose-Guided Human Instance Segmentation in Crowds},
+    booktitle = {Computer Vision Winter Workshop (CVWW)},
+    year      = {2026}
 }
 ```
 ## 🧑‍💻 Authors
 - Miroslav Purkrabek ([personal website](https://github.com/MiraPurkrabek))
+- Constantin Kolomiiets
 - Jiri Matas ([personal website](https://cmp.felk.cvut.cz/~matas/))