UniFormer-S_h14 — COCO-pretrained backbone (acaua mirror)

Backbone-only mirror. This is not a runnable detection or segmentation model on its own. It ships the UniFormer-S backbone weights as trained jointly with the upstream Mask R-CNN detector on COCO; the task head (FPN + RPN + ROI + mask head) has been stripped.

The mirror exists to gate acaua's Stage 1.5 UniFormer-dense-prediction spike — testing whether UniFormer-S can be hosted as a backbone inside torchvision.models. detection.MaskRCNN and transformers.UperNetForSemanticSegmentation without forcing users to download the full 165MB upstream checkpoint from Google Drive every time.

Provenance


Upstream code	`Sense-X/UniFormer` @ `main` (Apache-2.0); specifically `object_detection/mmdet/models/backbones/uniformer.py`
Upstream weights	Google Drive file id `13KhBYkHKQg-CyhAgn1LQM1K0R4bwSpWT`, filename `mask_rcnn_1x_uniformer_s_h14.pth` (165MB full MaskRCNN; we stripped the head)
Upstream SHA256 (full checkpoint)	`aa1e6bbec1c83344de96705f0e1aee853f1eec78df365e41ec802c202f00d9cf`
Upstream report	box mAP 45.6 / mask mAP 41.6 on COCO val, 1x schedule, single-clip single-scale eval
Architecture	depth=[3,4,8,3], embed_dim=[64,128,320,512], head_dim=64, `hybrid=True`, `window_size=14` — upstream's standard UniFormer-S_h14 dense-prediction variant
Backbone params	21.04M (of 41M full MaskRCNN total)
Mirrored on	2026-04-24
Mirrored by	CondadosAI/acaua

Usage via acaua

Not usable through acaua.Model.from_pretrained yet — this ships as backbone-only infrastructure for the upcoming Stage 1.5.b spike. Direct usage:

from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from acaua.adapters.uniformer._backbone_dense import UniFormer2DDense
from acaua.adapters.uniformer._config import DENSE_VARIANTS

path = hf_hub_download(
    "CondadosAI/uniformer_s_h14_backbone_coco", "model.safetensors"
)
sd = load_file(path)
m = UniFormer2DDense(DENSE_VARIANTS["s_h14_det"])
m.load_state_dict(sd, strict=True)
m.eval()
# ...plug into a detection/segmentation head...

The backbone emits 4 multi-scale feature maps with strides 4/8/16/32. At 800x1280 input, the shapes are (1, 64, 200, 320), (1, 128, 100, 160), (1, 320, 50, 80), (1, 512, 25, 40).

Files

model.safetensors — backbone weights (332 tensors, 21M params).
NOTICE — attribution chain (code + weights).
LICENSE — Apache-2.0.

License

Apache-2.0. Redistribution of the upstream UniFormer code and weights under their original declaration — see NOTICE for the attribution chain.

Citation

@inproceedings{li2022uniformer,
  title     = {UniFormer: Unifying Convolution and Self-attention for Visual Recognition},
  author    = {Li, Kunchang and Wang, Yali and Zhang, Junhao and Gao, Peng and Song, Guanglu and Liu, Yu and Li, Hongsheng and Qiao, Yu},
  booktitle = {ICLR},
  year      = {2022},
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Safetensors

Model size

21M params

Tensor type

F32

Inference Providers NEW

Image Feature Extraction

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including CondadosAI/uniformer_s_h14_backbone_coco

acaua v0.1 weights

Collection

Apache-2.0 verified model weights for acaua v0.1. Mirrors pin upstream SHAs. • 17 items • Updated 12 days ago