rtmo_s_body7 / README.md
CondadosAI's picture
docs: acaua mirror model card with upstream provenance (code + weights)
9186624 verified
---
license: apache-2.0
library_name: acaua
pipeline_tag: keypoint-detection
tags:
- pose-estimation
- keypoint-detection
- multi-person-pose
- vision
- acaua
- native-pytorch-port
datasets:
- COCO
- AI-Challenger
- CrowdPose
- MPII
- JHMDB
- Halpe
- PoseTrack18
---
# RTMO-s (body7) — acaua mirror (pure-PyTorch port)
This is a **pure-PyTorch port** of RTMO-s hosted under `CondadosAI/` for use with the [acaua](https://github.com/CondadosAI/acaua) computer vision library.
RTMO (Lu et al., CVPR 2024) is a one-stage real-time multi-person pose estimator that integrates coordinate classification into a YOLO-style architecture. This variant was trained on the **body7** composite dataset (COCO + AI Challenger + CrowdPose + MPII + sub-JHMDB + Halpe + PoseTrack18), producing a 17-keypoint COCO-schema skeleton.
The architecture has been re-implemented in pure PyTorch under `acaua.adapters.rtmo` — no `mmcv`, no `mmengine`, no `mmpose`, no `trust_remote_code`. The `model.safetensors` in this mirror is converted from the upstream `.pth` checkpoint to safetensors with the acaua adapter's state_dict key naming. It is NOT drop-in compatible with mmpose — weights are laid out to load cleanly into our `nn.Module` tree via `load_state_dict(strict=True)`.
## Provenance
| | |
|---|---|
| Upstream code | [`open-mmlab/mmpose`](https://github.com/open-mmlab/mmpose) @ `759b39c13fea6ba094afc1fa932f51dc1b11cbf9` (Apache-2.0) |
| Upstream weights URL | `https://download.openmmlab.com/mmpose/v1/projects/rtmo/rtmo-s_8xb32-600e_body7-640x640-dac2bf74_20231211.pth` |
| Upstream weights SHA256 | `dac2bf749bbfb51e69ca577ca0327dff4433e3be9a56b782f0b7ef94fb45247e` |
| Conversion script | [`scripts/convert_rtmo.py`](https://github.com/CondadosAI/acaua/blob/main/scripts/convert_rtmo.py) |
| Paper | Lu et al., *"RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation"*, CVPR 2024, arXiv:[2312.07526](https://arxiv.org/abs/2312.07526) |
| Mirrored on | 2026-04-22 |
| Mirrored by | [CondadosAI/acaua](https://github.com/CondadosAI/acaua) |
## Usage
```python
import acaua
model = acaua.Model.from_pretrained("CondadosAI/rtmo_s_body7")
result = model.predict("image.jpg")
# Result is a PoseResult with shape:
# result.boxes -> (N, 4) float32, xyxy
# result.labels -> (N,) int64 (person = 0)
# result.scores -> (N,) float32
# result.keypoints -> (N, 17, 2) float32, xy in image pixels
# result.keypoint_scores -> (N, 17) float32
# Skeleton edges + keypoint names live on the adapter:
import supervision as sv
kp = result.to_supervision()
sv.EdgeAnnotator(edges=model.skeleton).annotate(image, kp)
```
## Architecture
- **Backbone:** CSPDarknet (YOLOX-lineage), `widen_factor=0.5`, `deepen_factor=0.33`
- **Neck:** HybridEncoder (RT-DETR–style transformer encoder + FPN/PAN fusion), `hidden_dim=256`
- **Head:** RTMOHead with per-level YOLO-style box + visibility predictions and a Dynamic Coordinate Classifier (DCC) decoded via softmax expectation over `(192 × 256)` coordinate bins
- **Parameters:** ~9.87M
- **Input:** 640 × 640 letterboxed, RGB raw pixel values (no mean/std normalization per upstream `PoseDataPreprocessor`)
## Reported performance (upstream)
| Variant | Dataset | COCO val AP | COCO val AR | V100 FPS |
|---|---|---|---|---|
| **RTMO-s** | **body7** | **68.6** | 74.3 | ~141 |
## License and attribution
Redistributed under Apache-2.0, consistent with the upstream code and weights declarations. The acaua adapter is itself a derivative work of the upstream PyTorch implementation — see [`NOTICE`](./NOTICE) for the required attribution chain (code AND weights).
## Citation
```bibtex
@misc{lu2023rtmo,
title={{RTMO}: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation},
author={Peng Lu and Tao Jiang and Yining Li and Xiangtai Li and Kai Chen and Wenming Yang},
year={2023},
eprint={2312.07526},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```