---
license: apache-2.0
library_name: acaua
pipeline_tag: image-feature-extraction
tags:
  - backbone
  - vision
  - acaua
  - native-pytorch-port
datasets:
  - coco
---

# UniFormer-S_h14 — COCO-pretrained backbone (acaua mirror)

**Backbone-only mirror.** This is not a runnable detection or
segmentation model on its own. It ships the UniFormer-S backbone
weights as trained jointly with the upstream Mask R-CNN detector on
COCO; the task head (FPN + RPN + ROI + mask head) has been stripped.

The mirror exists to gate acaua's [Stage 1.5 UniFormer-dense-prediction
spike](https://github.com/CondadosAI/acaua) — testing whether
UniFormer-S can be hosted as a backbone inside `torchvision.models.
detection.MaskRCNN` and `transformers.UperNetForSemanticSegmentation`
without forcing users to download the full 165MB upstream checkpoint
from Google Drive every time.

## Provenance

| | |
|---|---|
| Upstream code | [`Sense-X/UniFormer`](https://github.com/Sense-X/UniFormer) @ `main` (Apache-2.0); specifically `object_detection/mmdet/models/backbones/uniformer.py` |
| Upstream weights | Google Drive file id `13KhBYkHKQg-CyhAgn1LQM1K0R4bwSpWT`, filename `mask_rcnn_1x_uniformer_s_h14.pth` (165MB full MaskRCNN; we stripped the head) |
| Upstream SHA256 (full checkpoint) | `aa1e6bbec1c83344de96705f0e1aee853f1eec78df365e41ec802c202f00d9cf` |
| Upstream report | box mAP 45.6 / mask mAP 41.6 on COCO val, 1x schedule, single-clip single-scale eval |
| Architecture | depth=[3,4,8,3], embed_dim=[64,128,320,512], head_dim=64, `hybrid=True`, `window_size=14` — upstream's standard UniFormer-S_h14 dense-prediction variant |
| Backbone params | 21.04M (of 41M full MaskRCNN total) |
| Mirrored on | 2026-04-24 |
| Mirrored by | [CondadosAI/acaua](https://github.com/CondadosAI/acaua) |

## Usage via acaua

Not usable through `acaua.Model.from_pretrained` yet — this ships as
backbone-only infrastructure for the upcoming Stage 1.5.b spike. Direct
usage:

```python
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from acaua.adapters.uniformer._backbone_dense import UniFormer2DDense
from acaua.adapters.uniformer._config import DENSE_VARIANTS

path = hf_hub_download(
    "CondadosAI/uniformer_s_h14_backbone_coco", "model.safetensors"
)
sd = load_file(path)
m = UniFormer2DDense(DENSE_VARIANTS["s_h14_det"])
m.load_state_dict(sd, strict=True)
m.eval()
# ...plug into a detection/segmentation head...
```

The backbone emits 4 multi-scale feature maps with strides 4/8/16/32.
At 800x1280 input, the shapes are
`(1, 64, 200, 320)`, `(1, 128, 100, 160)`, `(1, 320, 50, 80)`,
`(1, 512, 25, 40)`.

## Files

- `model.safetensors` — backbone weights (332 tensors, 21M params).
- `NOTICE` — attribution chain (code + weights).
- `LICENSE` — Apache-2.0.

## License

Apache-2.0. Redistribution of the upstream UniFormer code and weights
under their original declaration — see [`NOTICE`](./NOTICE) for the
attribution chain.

## Citation

```bibtex
@inproceedings{li2022uniformer,
  title     = {UniFormer: Unifying Convolution and Self-attention for Visual Recognition},
  author    = {Li, Kunchang and Wang, Yali and Zhang, Junhao and Gao, Peng and Song, Guanglu and Liu, Yu and Li, Hongsheng and Qiao, Yu},
  booktitle = {ICLR},
  year      = {2022},
}
```