docs: acaua mirror model card with upstream provenance (code + weights)

9186624 verified 22 days ago

4.06 kB

	---
	license: apache-2.0
	library_name: acaua
	pipeline_tag: keypoint-detection
	tags:
	- pose-estimation
	- keypoint-detection
	- multi-person-pose
	- vision
	- acaua
	- native-pytorch-port
	datasets:
	- COCO
	- AI-Challenger
	- CrowdPose
	- MPII
	- JHMDB
	- Halpe
	- PoseTrack18
	---

	# RTMO-s (body7) — acaua mirror (pure-PyTorch port)

	This is a pure-PyTorch port of RTMO-s hosted under `CondadosAI/` for use with the [acaua](https://github.com/CondadosAI/acaua) computer vision library.

	RTMO (Lu et al., CVPR 2024) is a one-stage real-time multi-person pose estimator that integrates coordinate classification into a YOLO-style architecture. This variant was trained on the body7 composite dataset (COCO + AI Challenger + CrowdPose + MPII + sub-JHMDB + Halpe + PoseTrack18), producing a 17-keypoint COCO-schema skeleton.

	The architecture has been re-implemented in pure PyTorch under `acaua.adapters.rtmo` — no `mmcv`, no `mmengine`, no `mmpose`, no `trust_remote_code`. The `model.safetensors` in this mirror is converted from the upstream `.pth` checkpoint to safetensors with the acaua adapter's state_dict key naming. It is NOT drop-in compatible with mmpose — weights are laid out to load cleanly into our `nn.Module` tree via `load_state_dict(strict=True)`.

	## Provenance

	\| \| \|
	\|---\|---\|
	\| Upstream code \| [`open-mmlab/mmpose`](https://github.com/open-mmlab/mmpose) @ `759b39c13fea6ba094afc1fa932f51dc1b11cbf9` (Apache-2.0) \|
	\| Upstream weights URL \| `https://download.openmmlab.com/mmpose/v1/projects/rtmo/rtmo-s_8xb32-600e_body7-640x640-dac2bf74_20231211.pth` \|
	\| Upstream weights SHA256 \| `dac2bf749bbfb51e69ca577ca0327dff4433e3be9a56b782f0b7ef94fb45247e` \|
	\| Conversion script \| [`scripts/convert_rtmo.py`](https://github.com/CondadosAI/acaua/blob/main/scripts/convert_rtmo.py) \|
	\| Paper \| Lu et al., "RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation", CVPR 2024, arXiv:[2312.07526](https://arxiv.org/abs/2312.07526) \|
	\| Mirrored on \| 2026-04-22 \|
	\| Mirrored by \| [CondadosAI/acaua](https://github.com/CondadosAI/acaua) \|

	## Usage

	```python
	import acaua

	model = acaua.Model.from_pretrained("CondadosAI/rtmo_s_body7")
	result = model.predict("image.jpg")

	# Result is a PoseResult with shape:
	# result.boxes -> (N, 4) float32, xyxy
	# result.labels -> (N,) int64 (person = 0)
	# result.scores -> (N,) float32
	# result.keypoints -> (N, 17, 2) float32, xy in image pixels
	# result.keypoint_scores -> (N, 17) float32

	# Skeleton edges + keypoint names live on the adapter:
	import supervision as sv
	kp = result.to_supervision()
	sv.EdgeAnnotator(edges=model.skeleton).annotate(image, kp)
	```

	## Architecture

	- Backbone: CSPDarknet (YOLOX-lineage), `widen_factor=0.5`, `deepen_factor=0.33`
	- Neck: HybridEncoder (RT-DETR–style transformer encoder + FPN/PAN fusion), `hidden_dim=256`
	- Head: RTMOHead with per-level YOLO-style box + visibility predictions and a Dynamic Coordinate Classifier (DCC) decoded via softmax expectation over `(192 × 256)` coordinate bins
	- Parameters: ~9.87M
	- Input: 640 × 640 letterboxed, RGB raw pixel values (no mean/std normalization per upstream `PoseDataPreprocessor`)

	## Reported performance (upstream)

	\| Variant \| Dataset \| COCO val AP \| COCO val AR \| V100 FPS \|
	\|---\|---\|---\|---\|---\|
	\| RTMO-s \| body7 \| 68.6 \| 74.3 \| ~141 \|

	## License and attribution

	Redistributed under Apache-2.0, consistent with the upstream code and weights declarations. The acaua adapter is itself a derivative work of the upstream PyTorch implementation — see [`NOTICE`](./NOTICE) for the required attribution chain (code AND weights).

	## Citation

	```bibtex
	@misc{lu2023rtmo,
	title={{RTMO}: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation},
	author={Peng Lu and Tao Jiang and Yining Li and Xiangtai Li and Kai Chen and Wenming Yang},
	year={2023},
	eprint={2312.07526},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```