| --- |
| license: apache-2.0 |
| library_name: acaua |
| pipeline_tag: keypoint-detection |
| tags: |
| - pose-estimation |
| - keypoint-detection |
| - multi-person-pose |
| - vision |
| - acaua |
| - native-pytorch-port |
| datasets: |
| - COCO |
| - AI-Challenger |
| - CrowdPose |
| - MPII |
| - JHMDB |
| - Halpe |
| - PoseTrack18 |
| --- |
| |
| # RTMO-s (body7) — acaua mirror (pure-PyTorch port) |
|
|
| This is a **pure-PyTorch port** of RTMO-s hosted under `CondadosAI/` for use with the [acaua](https://github.com/CondadosAI/acaua) computer vision library. |
|
|
| RTMO (Lu et al., CVPR 2024) is a one-stage real-time multi-person pose estimator that integrates coordinate classification into a YOLO-style architecture. This variant was trained on the **body7** composite dataset (COCO + AI Challenger + CrowdPose + MPII + sub-JHMDB + Halpe + PoseTrack18), producing a 17-keypoint COCO-schema skeleton. |
|
|
| The architecture has been re-implemented in pure PyTorch under `acaua.adapters.rtmo` — no `mmcv`, no `mmengine`, no `mmpose`, no `trust_remote_code`. The `model.safetensors` in this mirror is converted from the upstream `.pth` checkpoint to safetensors with the acaua adapter's state_dict key naming. It is NOT drop-in compatible with mmpose — weights are laid out to load cleanly into our `nn.Module` tree via `load_state_dict(strict=True)`. |
| |
| ## Provenance |
| |
| | | | |
| |---|---| |
| | Upstream code | [`open-mmlab/mmpose`](https://github.com/open-mmlab/mmpose) @ `759b39c13fea6ba094afc1fa932f51dc1b11cbf9` (Apache-2.0) | |
| | Upstream weights URL | `https://download.openmmlab.com/mmpose/v1/projects/rtmo/rtmo-s_8xb32-600e_body7-640x640-dac2bf74_20231211.pth` | |
| | Upstream weights SHA256 | `dac2bf749bbfb51e69ca577ca0327dff4433e3be9a56b782f0b7ef94fb45247e` | |
| | Conversion script | [`scripts/convert_rtmo.py`](https://github.com/CondadosAI/acaua/blob/main/scripts/convert_rtmo.py) | |
| | Paper | Lu et al., *"RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation"*, CVPR 2024, arXiv:[2312.07526](https://arxiv.org/abs/2312.07526) | |
| | Mirrored on | 2026-04-22 | |
| | Mirrored by | [CondadosAI/acaua](https://github.com/CondadosAI/acaua) | |
|
|
| ## Usage |
|
|
| ```python |
| import acaua |
| |
| model = acaua.Model.from_pretrained("CondadosAI/rtmo_s_body7") |
| result = model.predict("image.jpg") |
| |
| # Result is a PoseResult with shape: |
| # result.boxes -> (N, 4) float32, xyxy |
| # result.labels -> (N,) int64 (person = 0) |
| # result.scores -> (N,) float32 |
| # result.keypoints -> (N, 17, 2) float32, xy in image pixels |
| # result.keypoint_scores -> (N, 17) float32 |
| |
| # Skeleton edges + keypoint names live on the adapter: |
| import supervision as sv |
| kp = result.to_supervision() |
| sv.EdgeAnnotator(edges=model.skeleton).annotate(image, kp) |
| ``` |
|
|
| ## Architecture |
|
|
| - **Backbone:** CSPDarknet (YOLOX-lineage), `widen_factor=0.5`, `deepen_factor=0.33` |
| - **Neck:** HybridEncoder (RT-DETR–style transformer encoder + FPN/PAN fusion), `hidden_dim=256` |
| - **Head:** RTMOHead with per-level YOLO-style box + visibility predictions and a Dynamic Coordinate Classifier (DCC) decoded via softmax expectation over `(192 × 256)` coordinate bins |
| - **Parameters:** ~9.87M |
| - **Input:** 640 × 640 letterboxed, RGB raw pixel values (no mean/std normalization per upstream `PoseDataPreprocessor`) |
|
|
| ## Reported performance (upstream) |
|
|
| | Variant | Dataset | COCO val AP | COCO val AR | V100 FPS | |
| |---|---|---|---|---| |
| | **RTMO-s** | **body7** | **68.6** | 74.3 | ~141 | |
|
|
| ## License and attribution |
|
|
| Redistributed under Apache-2.0, consistent with the upstream code and weights declarations. The acaua adapter is itself a derivative work of the upstream PyTorch implementation — see [`NOTICE`](./NOTICE) for the required attribution chain (code AND weights). |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{lu2023rtmo, |
| title={{RTMO}: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation}, |
| author={Peng Lu and Tao Jiang and Yining Li and Xiangtai Li and Kai Chen and Wenming Yang}, |
| year={2023}, |
| eprint={2312.07526}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CV} |
| } |
| ``` |
|
|