UniFormer-S_h14 β€” COCO-pretrained backbone (acaua mirror)

Backbone-only mirror. This is not a runnable detection or segmentation model on its own. It ships the UniFormer-S backbone weights as trained jointly with the upstream Mask R-CNN detector on COCO; the task head (FPN + RPN + ROI + mask head) has been stripped.

The mirror exists to gate acaua's Stage 1.5 UniFormer-dense-prediction spike β€” testing whether UniFormer-S can be hosted as a backbone inside torchvision.models. detection.MaskRCNN and transformers.UperNetForSemanticSegmentation without forcing users to download the full 165MB upstream checkpoint from Google Drive every time.

Provenance

Upstream code Sense-X/UniFormer @ main (Apache-2.0); specifically object_detection/mmdet/models/backbones/uniformer.py
Upstream weights Google Drive file id 13KhBYkHKQg-CyhAgn1LQM1K0R4bwSpWT, filename mask_rcnn_1x_uniformer_s_h14.pth (165MB full MaskRCNN; we stripped the head)
Upstream SHA256 (full checkpoint) aa1e6bbec1c83344de96705f0e1aee853f1eec78df365e41ec802c202f00d9cf
Upstream report box mAP 45.6 / mask mAP 41.6 on COCO val, 1x schedule, single-clip single-scale eval
Architecture depth=[3,4,8,3], embed_dim=[64,128,320,512], head_dim=64, hybrid=True, window_size=14 β€” upstream's standard UniFormer-S_h14 dense-prediction variant
Backbone params 21.04M (of 41M full MaskRCNN total)
Mirrored on 2026-04-24
Mirrored by CondadosAI/acaua

Usage via acaua

Not usable through acaua.Model.from_pretrained yet β€” this ships as backbone-only infrastructure for the upcoming Stage 1.5.b spike. Direct usage:

from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from acaua.adapters.uniformer._backbone_dense import UniFormer2DDense
from acaua.adapters.uniformer._config import DENSE_VARIANTS

path = hf_hub_download(
    "CondadosAI/uniformer_s_h14_backbone_coco", "model.safetensors"
)
sd = load_file(path)
m = UniFormer2DDense(DENSE_VARIANTS["s_h14_det"])
m.load_state_dict(sd, strict=True)
m.eval()
# ...plug into a detection/segmentation head...

The backbone emits 4 multi-scale feature maps with strides 4/8/16/32. At 800x1280 input, the shapes are (1, 64, 200, 320), (1, 128, 100, 160), (1, 320, 50, 80), (1, 512, 25, 40).

Files

  • model.safetensors β€” backbone weights (332 tensors, 21M params).
  • NOTICE β€” attribution chain (code + weights).
  • LICENSE β€” Apache-2.0.

License

Apache-2.0. Redistribution of the upstream UniFormer code and weights under their original declaration β€” see NOTICE for the attribution chain.

Citation

@inproceedings{li2022uniformer,
  title     = {UniFormer: Unifying Convolution and Self-attention for Visual Recognition},
  author    = {Li, Kunchang and Wang, Yali and Zhang, Junhao and Gao, Peng and Song, Guanglu and Liu, Yu and Li, Hongsheng and Qiao, Yu},
  booktitle = {ICLR},
  year      = {2022},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
21M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including CondadosAI/uniformer_s_h14_backbone_coco