--- license: apache-2.0 library_name: acaua pipeline_tag: keypoint-detection tags: - pose-estimation - keypoint-detection - multi-person-pose - vision - acaua - native-pytorch-port datasets: - COCO - AI-Challenger - CrowdPose - MPII - JHMDB - Halpe - PoseTrack18 --- # RTMO-s (body7) — acaua mirror (pure-PyTorch port) This is a **pure-PyTorch port** of RTMO-s hosted under `CondadosAI/` for use with the [acaua](https://github.com/CondadosAI/acaua) computer vision library. RTMO (Lu et al., CVPR 2024) is a one-stage real-time multi-person pose estimator that integrates coordinate classification into a YOLO-style architecture. This variant was trained on the **body7** composite dataset (COCO + AI Challenger + CrowdPose + MPII + sub-JHMDB + Halpe + PoseTrack18), producing a 17-keypoint COCO-schema skeleton. The architecture has been re-implemented in pure PyTorch under `acaua.adapters.rtmo` — no `mmcv`, no `mmengine`, no `mmpose`, no `trust_remote_code`. The `model.safetensors` in this mirror is converted from the upstream `.pth` checkpoint to safetensors with the acaua adapter's state_dict key naming. It is NOT drop-in compatible with mmpose — weights are laid out to load cleanly into our `nn.Module` tree via `load_state_dict(strict=True)`. ## Provenance | | | |---|---| | Upstream code | [`open-mmlab/mmpose`](https://github.com/open-mmlab/mmpose) @ `759b39c13fea6ba094afc1fa932f51dc1b11cbf9` (Apache-2.0) | | Upstream weights URL | `https://download.openmmlab.com/mmpose/v1/projects/rtmo/rtmo-s_8xb32-600e_body7-640x640-dac2bf74_20231211.pth` | | Upstream weights SHA256 | `dac2bf749bbfb51e69ca577ca0327dff4433e3be9a56b782f0b7ef94fb45247e` | | Conversion script | [`scripts/convert_rtmo.py`](https://github.com/CondadosAI/acaua/blob/main/scripts/convert_rtmo.py) | | Paper | Lu et al., *"RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation"*, CVPR 2024, arXiv:[2312.07526](https://arxiv.org/abs/2312.07526) | | Mirrored on | 2026-04-22 | | Mirrored by | [CondadosAI/acaua](https://github.com/CondadosAI/acaua) | ## Usage ```python import acaua model = acaua.Model.from_pretrained("CondadosAI/rtmo_s_body7") result = model.predict("image.jpg") # Result is a PoseResult with shape: # result.boxes -> (N, 4) float32, xyxy # result.labels -> (N,) int64 (person = 0) # result.scores -> (N,) float32 # result.keypoints -> (N, 17, 2) float32, xy in image pixels # result.keypoint_scores -> (N, 17) float32 # Skeleton edges + keypoint names live on the adapter: import supervision as sv kp = result.to_supervision() sv.EdgeAnnotator(edges=model.skeleton).annotate(image, kp) ``` ## Architecture - **Backbone:** CSPDarknet (YOLOX-lineage), `widen_factor=0.5`, `deepen_factor=0.33` - **Neck:** HybridEncoder (RT-DETR–style transformer encoder + FPN/PAN fusion), `hidden_dim=256` - **Head:** RTMOHead with per-level YOLO-style box + visibility predictions and a Dynamic Coordinate Classifier (DCC) decoded via softmax expectation over `(192 × 256)` coordinate bins - **Parameters:** ~9.87M - **Input:** 640 × 640 letterboxed, RGB raw pixel values (no mean/std normalization per upstream `PoseDataPreprocessor`) ## Reported performance (upstream) | Variant | Dataset | COCO val AP | COCO val AR | V100 FPS | |---|---|---|---|---| | **RTMO-s** | **body7** | **68.6** | 74.3 | ~141 | ## License and attribution Redistributed under Apache-2.0, consistent with the upstream code and weights declarations. The acaua adapter is itself a derivative work of the upstream PyTorch implementation — see [`NOTICE`](./NOTICE) for the required attribution chain (code AND weights). ## Citation ```bibtex @misc{lu2023rtmo, title={{RTMO}: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation}, author={Peng Lu and Tao Jiang and Yining Li and Xiangtai Li and Kai Chen and Wenming Yang}, year={2023}, eprint={2312.07526}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```