docs: acaua mirror model card with code+weights provenance

52f5846 verified about 1 month ago

3.31 kB

	---
	license: apache-2.0
	library_name: acaua
	pipeline_tag: image-feature-extraction
	tags:
	- backbone
	- vision
	- acaua
	- native-pytorch-port
	datasets:
	- coco
	---

	# UniFormer-S_h14 — COCO-pretrained backbone (acaua mirror)

	Backbone-only mirror. This is not a runnable detection or
	segmentation model on its own. It ships the UniFormer-S backbone
	weights as trained jointly with the upstream Mask R-CNN detector on
	COCO; the task head (FPN + RPN + ROI + mask head) has been stripped.

	The mirror exists to gate acaua's [Stage 1.5 UniFormer-dense-prediction
	spike](https://github.com/CondadosAI/acaua) — testing whether
	UniFormer-S can be hosted as a backbone inside `torchvision.models.
	detection.MaskRCNN` and `transformers.UperNetForSemanticSegmentation`
	without forcing users to download the full 165MB upstream checkpoint
	from Google Drive every time.

	## Provenance

	\| \| \|
	\|---\|---\|
	\| Upstream code \| [`Sense-X/UniFormer`](https://github.com/Sense-X/UniFormer) @ `main` (Apache-2.0); specifically `object_detection/mmdet/models/backbones/uniformer.py` \|
	\| Upstream weights \| Google Drive file id `13KhBYkHKQg-CyhAgn1LQM1K0R4bwSpWT`, filename `mask_rcnn_1x_uniformer_s_h14.pth` (165MB full MaskRCNN; we stripped the head) \|
	\| Upstream SHA256 (full checkpoint) \| `aa1e6bbec1c83344de96705f0e1aee853f1eec78df365e41ec802c202f00d9cf` \|
	\| Upstream report \| box mAP 45.6 / mask mAP 41.6 on COCO val, 1x schedule, single-clip single-scale eval \|
	\| Architecture \| depth=[3,4,8,3], embed_dim=[64,128,320,512], head_dim=64, `hybrid=True`, `window_size=14` — upstream's standard UniFormer-S_h14 dense-prediction variant \|
	\| Backbone params \| 21.04M (of 41M full MaskRCNN total) \|
	\| Mirrored on \| 2026-04-24 \|
	\| Mirrored by \| [CondadosAI/acaua](https://github.com/CondadosAI/acaua) \|

	## Usage via acaua

	Not usable through `acaua.Model.from_pretrained` yet — this ships as
	backbone-only infrastructure for the upcoming Stage 1.5.b spike. Direct
	usage:

	```python
	from huggingface_hub import hf_hub_download
	from safetensors.torch import load_file
	from acaua.adapters.uniformer._backbone_dense import UniFormer2DDense
	from acaua.adapters.uniformer._config import DENSE_VARIANTS

	path = hf_hub_download(
	"CondadosAI/uniformer_s_h14_backbone_coco", "model.safetensors"
	)
	sd = load_file(path)
	m = UniFormer2DDense(DENSE_VARIANTS["s_h14_det"])
	m.load_state_dict(sd, strict=True)
	m.eval()
	# ...plug into a detection/segmentation head...
	```

	The backbone emits 4 multi-scale feature maps with strides 4/8/16/32.
	At 800x1280 input, the shapes are
	`(1, 64, 200, 320)`, `(1, 128, 100, 160)`, `(1, 320, 50, 80)`,
	`(1, 512, 25, 40)`.

	## Files

	- `model.safetensors` — backbone weights (332 tensors, 21M params).
	- `NOTICE` — attribution chain (code + weights).
	- `LICENSE` — Apache-2.0.

	## License

	Apache-2.0. Redistribution of the upstream UniFormer code and weights
	under their original declaration — see [`NOTICE`](./NOTICE) for the
	attribution chain.

	## Citation

	```bibtex
	@inproceedings{li2022uniformer,
	title = {UniFormer: Unifying Convolution and Self-attention for Visual Recognition},
	author = {Li, Kunchang and Wang, Yali and Zhang, Junhao and Gao, Peng and Song, Guanglu and Liu, Yu and Li, Hongsheng and Qiao, Yu},
	booktitle = {ICLR},
	year = {2022},
	}
	```