| --- |
| license: apache-2.0 |
| library_name: acaua |
| pipeline_tag: image-feature-extraction |
| tags: |
| - backbone |
| - vision |
| - acaua |
| - native-pytorch-port |
| datasets: |
| - coco |
| --- |
| |
| # UniFormer-S_h14 β COCO-pretrained backbone (acaua mirror) |
| |
| **Backbone-only mirror.** This is not a runnable detection or |
| segmentation model on its own. It ships the UniFormer-S backbone |
| weights as trained jointly with the upstream Mask R-CNN detector on |
| COCO; the task head (FPN + RPN + ROI + mask head) has been stripped. |
| |
| The mirror exists to gate acaua's [Stage 1.5 UniFormer-dense-prediction |
| spike](https://github.com/CondadosAI/acaua) β testing whether |
| UniFormer-S can be hosted as a backbone inside `torchvision.models. |
| detection.MaskRCNN` and `transformers.UperNetForSemanticSegmentation` |
| without forcing users to download the full 165MB upstream checkpoint |
| from Google Drive every time. |
| |
| ## Provenance |
| |
| | | | |
| |---|---| |
| | Upstream code | [`Sense-X/UniFormer`](https://github.com/Sense-X/UniFormer) @ `main` (Apache-2.0); specifically `object_detection/mmdet/models/backbones/uniformer.py` | |
| | Upstream weights | Google Drive file id `13KhBYkHKQg-CyhAgn1LQM1K0R4bwSpWT`, filename `mask_rcnn_1x_uniformer_s_h14.pth` (165MB full MaskRCNN; we stripped the head) | |
| | Upstream SHA256 (full checkpoint) | `aa1e6bbec1c83344de96705f0e1aee853f1eec78df365e41ec802c202f00d9cf` | |
| | Upstream report | box mAP 45.6 / mask mAP 41.6 on COCO val, 1x schedule, single-clip single-scale eval | |
| | Architecture | depth=[3,4,8,3], embed_dim=[64,128,320,512], head_dim=64, `hybrid=True`, `window_size=14` β upstream's standard UniFormer-S_h14 dense-prediction variant | |
| | Backbone params | 21.04M (of 41M full MaskRCNN total) | |
| | Mirrored on | 2026-04-24 | |
| | Mirrored by | [CondadosAI/acaua](https://github.com/CondadosAI/acaua) | |
| |
| ## Usage via acaua |
| |
| Not usable through `acaua.Model.from_pretrained` yet β this ships as |
| backbone-only infrastructure for the upcoming Stage 1.5.b spike. Direct |
| usage: |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| from safetensors.torch import load_file |
| from acaua.adapters.uniformer._backbone_dense import UniFormer2DDense |
| from acaua.adapters.uniformer._config import DENSE_VARIANTS |
| |
| path = hf_hub_download( |
| "CondadosAI/uniformer_s_h14_backbone_coco", "model.safetensors" |
| ) |
| sd = load_file(path) |
| m = UniFormer2DDense(DENSE_VARIANTS["s_h14_det"]) |
| m.load_state_dict(sd, strict=True) |
| m.eval() |
| # ...plug into a detection/segmentation head... |
| ``` |
|
|
| The backbone emits 4 multi-scale feature maps with strides 4/8/16/32. |
| At 800x1280 input, the shapes are |
| `(1, 64, 200, 320)`, `(1, 128, 100, 160)`, `(1, 320, 50, 80)`, |
| `(1, 512, 25, 40)`. |
|
|
| ## Files |
|
|
| - `model.safetensors` β backbone weights (332 tensors, 21M params). |
| - `NOTICE` β attribution chain (code + weights). |
| - `LICENSE` β Apache-2.0. |
|
|
| ## License |
|
|
| Apache-2.0. Redistribution of the upstream UniFormer code and weights |
| under their original declaration β see [`NOTICE`](./NOTICE) for the |
| attribution chain. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @inproceedings{li2022uniformer, |
| title = {UniFormer: Unifying Convolution and Self-attention for Visual Recognition}, |
| author = {Li, Kunchang and Wang, Yali and Zhang, Junhao and Gao, Peng and Song, Guanglu and Liu, Yu and Li, Hongsheng and Qiao, Yu}, |
| booktitle = {ICLR}, |
| year = {2022}, |
| } |
| ``` |
|
|