docs: acaua mirror model card with code+weights provenance

f0e1818 verified 21 days ago

3.85 kB

	---
	license: mit
	library_name: acaua
	pipeline_tag: keypoint-detection
	tags:
	- pose-estimation
	- keypoint-detection
	- vision
	- acaua
	- native-pytorch-port
	datasets:
	- coco
	---

	# UniFormer-S COCO top-down pose — acaua mirror (pure-PyTorch port)

	Pure-PyTorch port of UniFormer-S (top-down COCO 17-keypoint pose at
	256x192) hosted under `CondadosAI/` for use with the
	[acaua](https://github.com/CondadosAI/acaua) computer vision library.

	The architecture has been re-implemented in pure PyTorch under
	`acaua.adapters.uniformer.pose` — no `mmcv`, no `mmengine`, no
	`mmpose`, no `trust_remote_code`, no `timm` runtime dependency. The
	backbone reuses `UniFormer2DDense` (already shipped via PR #9 for the
	Stage 1.5 dense-prediction work); the pose head is a fresh port of
	mmpose's `TopDownSimpleHead`. Decode is the upstream
	`post_process='default'` path (argmax + 0.25-pixel shift, NOT the DARK
	unbiased decoder). Inverse-warp uses the shared
	`acaua.pose.topdown_utils` module (introduced in PR #8 ahead of this
	stage).

	## Provenance

	\| \| \|
	\|---\|---\|
	\| Upstream code \| [`Sense-X/UniFormer`](https://github.com/Sense-X/UniFormer) @ `main` (Apache-2.0); files derived: `pose_estimation/mmpose/models/backbones/uniformer.py` (backbone, identical to detection variant up to module-class identity) + `pose_estimation/mmpose/models/keypoint_heads/top_down_simple_head.py` (head) \|
	\| Upstream weights \| Google Drive file id `162R0JuTpf3gpLe1IK6oxRoQK7JSj4ylx`, filename `top_down_256x192_global_small.pth` (101MB) \|
	\| Upstream SHA256 \| `d77059e3e9322c0e20dc89dc0cf2a583ffe2ced7d3e9b350233738add570bc30` \|
	\| Upstream report \| AP 74.0 / AP@50 90.3 / AP@75 82.2 on COCO val 2017, 256x192, single-scale \|
	\| Architecture \| UniFormer-S backbone (`hybrid=False, windows=False`, depth=[3,4,8,3], embed_dims=[64,128,320,512], head_dim=64) + TopDownSimpleHead (3x ConvTranspose2d-stride-2 + BN+ReLU upsample, 1x1 conv to 17 channels) \|
	\| Total params \| 25.23M (backbone 21.04M + head 4.19M) \|
	\| Mirrored on \| 2026-04-25 \|
	\| Mirrored by \| [CondadosAI/acaua](https://github.com/CondadosAI/acaua) \|

	## Usage via acaua

	```python
	import acaua

	# MIT-declared weights -> explicit opt-in (same posture as RTMPose +
	# UniFormer image / video classifications). The bundled RTMDet-tiny
	# detector is loaded automatically from CondadosAI/rtmdet_t_coco.
	model = acaua.Model.from_pretrained(
	"CondadosAI/uniformer_s_coco_pose", allow_non_apache=True
	)

	result = model.predict("image.jpg")
	print(result.keypoints.shape) # (N_persons, 17, 2)
	print(result.keypoint_scores.shape) # (N_persons, 17)

	# COCO skeleton edges are surfaced on the adapter:
	import supervision as sv
	sv.EdgeAnnotator(edges=model.skeleton).annotate(scene, result.to_supervision())
	```

	## Files in this mirror

	- `model.safetensors` — full pose model weights (backbone + head, 352
	tensors). Loaded under `load_state_dict(strict=True)` at adapter
	init time.
	- `config.json` — `acaua_task=pose`, COCO-17 `keypoint_names` +
	`skeleton`, `detector_repo_id=CondadosAI/rtmdet_t_coco`. Adapter
	surfaces these as `model.keypoint_names` / `model.skeleton`.
	- `NOTICE` — attribution chain (code AND weights).
	- `LICENSE` — Apache-2.0.

	## License and attribution

	The adapter code is redistributed under Apache-2.0. The underlying
	weights carry upstream's MIT declaration (compatible). The acaua
	UniFormer-pose adapter is a derivative work of the upstream PyTorch
	implementation — see [`NOTICE`](./NOTICE) for the attribution chain.

	## Citation

	```bibtex
	@inproceedings{li2022uniformer,
	title = {UniFormer: Unifying Convolution and Self-attention for Visual Recognition},
	author = {Li, Kunchang and Wang, Yali and Zhang, Junhao and Gao, Peng and Song, Guanglu and Liu, Yu and Li, Hongsheng and Qiao, Yu},
	booktitle = {ICLR},
	year = {2022},
	}
	```