File size: 4,058 Bytes
9186624
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
license: apache-2.0
library_name: acaua
pipeline_tag: keypoint-detection
tags:
  - pose-estimation
  - keypoint-detection
  - multi-person-pose
  - vision
  - acaua
  - native-pytorch-port
datasets:
  - COCO
  - AI-Challenger
  - CrowdPose
  - MPII
  - JHMDB
  - Halpe
  - PoseTrack18
---

# RTMO-s (body7) — acaua mirror (pure-PyTorch port)

This is a **pure-PyTorch port** of RTMO-s hosted under `CondadosAI/` for use with the [acaua](https://github.com/CondadosAI/acaua) computer vision library.

RTMO (Lu et al., CVPR 2024) is a one-stage real-time multi-person pose estimator that integrates coordinate classification into a YOLO-style architecture. This variant was trained on the **body7** composite dataset (COCO + AI Challenger + CrowdPose + MPII + sub-JHMDB + Halpe + PoseTrack18), producing a 17-keypoint COCO-schema skeleton.

The architecture has been re-implemented in pure PyTorch under `acaua.adapters.rtmo` — no `mmcv`, no `mmengine`, no `mmpose`, no `trust_remote_code`. The `model.safetensors` in this mirror is converted from the upstream `.pth` checkpoint to safetensors with the acaua adapter's state_dict key naming. It is NOT drop-in compatible with mmpose — weights are laid out to load cleanly into our `nn.Module` tree via `load_state_dict(strict=True)`.

## Provenance

| | |
|---|---|
| Upstream code | [`open-mmlab/mmpose`](https://github.com/open-mmlab/mmpose) @ `759b39c13fea6ba094afc1fa932f51dc1b11cbf9` (Apache-2.0) |
| Upstream weights URL | `https://download.openmmlab.com/mmpose/v1/projects/rtmo/rtmo-s_8xb32-600e_body7-640x640-dac2bf74_20231211.pth` |
| Upstream weights SHA256 | `dac2bf749bbfb51e69ca577ca0327dff4433e3be9a56b782f0b7ef94fb45247e` |
| Conversion script | [`scripts/convert_rtmo.py`](https://github.com/CondadosAI/acaua/blob/main/scripts/convert_rtmo.py) |
| Paper | Lu et al., *"RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation"*, CVPR 2024, arXiv:[2312.07526](https://arxiv.org/abs/2312.07526) |
| Mirrored on | 2026-04-22 |
| Mirrored by | [CondadosAI/acaua](https://github.com/CondadosAI/acaua) |

## Usage

```python
import acaua

model = acaua.Model.from_pretrained("CondadosAI/rtmo_s_body7")
result = model.predict("image.jpg")

# Result is a PoseResult with shape:
#   result.boxes            -> (N, 4) float32, xyxy
#   result.labels           -> (N,)   int64  (person = 0)
#   result.scores           -> (N,)   float32
#   result.keypoints        -> (N, 17, 2) float32, xy in image pixels
#   result.keypoint_scores  -> (N, 17)    float32

# Skeleton edges + keypoint names live on the adapter:
import supervision as sv
kp = result.to_supervision()
sv.EdgeAnnotator(edges=model.skeleton).annotate(image, kp)
```

## Architecture

- **Backbone:** CSPDarknet (YOLOX-lineage), `widen_factor=0.5`, `deepen_factor=0.33`
- **Neck:** HybridEncoder (RT-DETR–style transformer encoder + FPN/PAN fusion), `hidden_dim=256`
- **Head:** RTMOHead with per-level YOLO-style box + visibility predictions and a Dynamic Coordinate Classifier (DCC) decoded via softmax expectation over `(192 × 256)` coordinate bins
- **Parameters:** ~9.87M
- **Input:** 640 × 640 letterboxed, RGB raw pixel values (no mean/std normalization per upstream `PoseDataPreprocessor`)

## Reported performance (upstream)

| Variant | Dataset | COCO val AP | COCO val AR | V100 FPS |
|---|---|---|---|---|
| **RTMO-s** | **body7** | **68.6** | 74.3 | ~141 |

## License and attribution

Redistributed under Apache-2.0, consistent with the upstream code and weights declarations. The acaua adapter is itself a derivative work of the upstream PyTorch implementation — see [`NOTICE`](./NOTICE) for the required attribution chain (code AND weights).

## Citation

```bibtex
@misc{lu2023rtmo,
      title={{RTMO}: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation},
      author={Peng Lu and Tao Jiang and Yining Li and Xiangtai Li and Kai Chen and Wenming Yang},
      year={2023},
      eprint={2312.07526},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```