--- license: apache-2.0 library_name: acaua pipeline_tag: image-feature-extraction tags: - backbone - vision - acaua - native-pytorch-port datasets: - coco --- # UniFormer-S_h14 — COCO-pretrained backbone (acaua mirror) **Backbone-only mirror.** This is not a runnable detection or segmentation model on its own. It ships the UniFormer-S backbone weights as trained jointly with the upstream Mask R-CNN detector on COCO; the task head (FPN + RPN + ROI + mask head) has been stripped. The mirror exists to gate acaua's [Stage 1.5 UniFormer-dense-prediction spike](https://github.com/CondadosAI/acaua) — testing whether UniFormer-S can be hosted as a backbone inside `torchvision.models. detection.MaskRCNN` and `transformers.UperNetForSemanticSegmentation` without forcing users to download the full 165MB upstream checkpoint from Google Drive every time. ## Provenance | | | |---|---| | Upstream code | [`Sense-X/UniFormer`](https://github.com/Sense-X/UniFormer) @ `main` (Apache-2.0); specifically `object_detection/mmdet/models/backbones/uniformer.py` | | Upstream weights | Google Drive file id `13KhBYkHKQg-CyhAgn1LQM1K0R4bwSpWT`, filename `mask_rcnn_1x_uniformer_s_h14.pth` (165MB full MaskRCNN; we stripped the head) | | Upstream SHA256 (full checkpoint) | `aa1e6bbec1c83344de96705f0e1aee853f1eec78df365e41ec802c202f00d9cf` | | Upstream report | box mAP 45.6 / mask mAP 41.6 on COCO val, 1x schedule, single-clip single-scale eval | | Architecture | depth=[3,4,8,3], embed_dim=[64,128,320,512], head_dim=64, `hybrid=True`, `window_size=14` — upstream's standard UniFormer-S_h14 dense-prediction variant | | Backbone params | 21.04M (of 41M full MaskRCNN total) | | Mirrored on | 2026-04-24 | | Mirrored by | [CondadosAI/acaua](https://github.com/CondadosAI/acaua) | ## Usage via acaua Not usable through `acaua.Model.from_pretrained` yet — this ships as backbone-only infrastructure for the upcoming Stage 1.5.b spike. Direct usage: ```python from huggingface_hub import hf_hub_download from safetensors.torch import load_file from acaua.adapters.uniformer._backbone_dense import UniFormer2DDense from acaua.adapters.uniformer._config import DENSE_VARIANTS path = hf_hub_download( "CondadosAI/uniformer_s_h14_backbone_coco", "model.safetensors" ) sd = load_file(path) m = UniFormer2DDense(DENSE_VARIANTS["s_h14_det"]) m.load_state_dict(sd, strict=True) m.eval() # ...plug into a detection/segmentation head... ``` The backbone emits 4 multi-scale feature maps with strides 4/8/16/32. At 800x1280 input, the shapes are `(1, 64, 200, 320)`, `(1, 128, 100, 160)`, `(1, 320, 50, 80)`, `(1, 512, 25, 40)`. ## Files - `model.safetensors` — backbone weights (332 tensors, 21M params). - `NOTICE` — attribution chain (code + weights). - `LICENSE` — Apache-2.0. ## License Apache-2.0. Redistribution of the upstream UniFormer code and weights under their original declaration — see [`NOTICE`](./NOTICE) for the attribution chain. ## Citation ```bibtex @inproceedings{li2022uniformer, title = {UniFormer: Unifying Convolution and Self-attention for Visual Recognition}, author = {Li, Kunchang and Wang, Yali and Zhang, Junhao and Gao, Peng and Song, Guanglu and Liu, Yu and Li, Hongsheng and Qiao, Yu}, booktitle = {ICLR}, year = {2022}, } ```