CondadosAI's picture
docs: acaua mirror model card with code+weights provenance
52f5846 verified
---
license: apache-2.0
library_name: acaua
pipeline_tag: image-feature-extraction
tags:
- backbone
- vision
- acaua
- native-pytorch-port
datasets:
- coco
---
# UniFormer-S_h14 β€” COCO-pretrained backbone (acaua mirror)
**Backbone-only mirror.** This is not a runnable detection or
segmentation model on its own. It ships the UniFormer-S backbone
weights as trained jointly with the upstream Mask R-CNN detector on
COCO; the task head (FPN + RPN + ROI + mask head) has been stripped.
The mirror exists to gate acaua's [Stage 1.5 UniFormer-dense-prediction
spike](https://github.com/CondadosAI/acaua) β€” testing whether
UniFormer-S can be hosted as a backbone inside `torchvision.models.
detection.MaskRCNN` and `transformers.UperNetForSemanticSegmentation`
without forcing users to download the full 165MB upstream checkpoint
from Google Drive every time.
## Provenance
| | |
|---|---|
| Upstream code | [`Sense-X/UniFormer`](https://github.com/Sense-X/UniFormer) @ `main` (Apache-2.0); specifically `object_detection/mmdet/models/backbones/uniformer.py` |
| Upstream weights | Google Drive file id `13KhBYkHKQg-CyhAgn1LQM1K0R4bwSpWT`, filename `mask_rcnn_1x_uniformer_s_h14.pth` (165MB full MaskRCNN; we stripped the head) |
| Upstream SHA256 (full checkpoint) | `aa1e6bbec1c83344de96705f0e1aee853f1eec78df365e41ec802c202f00d9cf` |
| Upstream report | box mAP 45.6 / mask mAP 41.6 on COCO val, 1x schedule, single-clip single-scale eval |
| Architecture | depth=[3,4,8,3], embed_dim=[64,128,320,512], head_dim=64, `hybrid=True`, `window_size=14` β€” upstream's standard UniFormer-S_h14 dense-prediction variant |
| Backbone params | 21.04M (of 41M full MaskRCNN total) |
| Mirrored on | 2026-04-24 |
| Mirrored by | [CondadosAI/acaua](https://github.com/CondadosAI/acaua) |
## Usage via acaua
Not usable through `acaua.Model.from_pretrained` yet β€” this ships as
backbone-only infrastructure for the upcoming Stage 1.5.b spike. Direct
usage:
```python
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from acaua.adapters.uniformer._backbone_dense import UniFormer2DDense
from acaua.adapters.uniformer._config import DENSE_VARIANTS
path = hf_hub_download(
"CondadosAI/uniformer_s_h14_backbone_coco", "model.safetensors"
)
sd = load_file(path)
m = UniFormer2DDense(DENSE_VARIANTS["s_h14_det"])
m.load_state_dict(sd, strict=True)
m.eval()
# ...plug into a detection/segmentation head...
```
The backbone emits 4 multi-scale feature maps with strides 4/8/16/32.
At 800x1280 input, the shapes are
`(1, 64, 200, 320)`, `(1, 128, 100, 160)`, `(1, 320, 50, 80)`,
`(1, 512, 25, 40)`.
## Files
- `model.safetensors` β€” backbone weights (332 tensors, 21M params).
- `NOTICE` β€” attribution chain (code + weights).
- `LICENSE` β€” Apache-2.0.
## License
Apache-2.0. Redistribution of the upstream UniFormer code and weights
under their original declaration β€” see [`NOTICE`](./NOTICE) for the
attribution chain.
## Citation
```bibtex
@inproceedings{li2022uniformer,
title = {UniFormer: Unifying Convolution and Self-attention for Visual Recognition},
author = {Li, Kunchang and Wang, Yali and Zhang, Junhao and Gao, Peng and Song, Guanglu and Liu, Yu and Li, Hongsheng and Qiao, Yu},
booktitle = {ICLR},
year = {2022},
}
```