--- license: mit library_name: acaua pipeline_tag: video-classification tags: - video-classification - vision - acaua - native-pytorch-port datasets: - kinetics-400 --- # UniFormer-S (Kinetics-400) — acaua mirror (pure-PyTorch port) Pure-PyTorch port of **UniFormer-S** (video classification, trained on Kinetics-400 with 16-frame clips at sampling stride 8) hosted under `CondadosAI/` for use with the [acaua](https://github.com/CondadosAI/acaua) computer vision library. The architecture has been re-implemented in pure PyTorch under `acaua.adapters.uniformer.video` — no `mmcv`, no `mmengine`, no `mmaction2`, no `trust_remote_code`, no `timm` runtime dependency. The weights are converted from the upstream `.pth` checkpoint to safetensors with acaua's state-dict key naming (`backbone.*` + `head.fc.*`). They are **not** drop-in compatible with timm or Sense-X/UniFormer loaders — they are designed to load cleanly into acaua's `nn.Module` tree under `load_state_dict(strict=True)`. ## Provenance | | | |---|---| | Upstream code | [`Sense-X/UniFormer`](https://github.com/Sense-X/UniFormer) @ `main` (Apache-2.0) | | Upstream weights | [`Sense-X/uniformer_video`](https://huggingface.co/Sense-X/uniformer_video) at revision `f9448914e6161573b14ba47b72fcef170e03a1f9` (MIT) | | Upstream file | `uniformer_small_k400_16x8.pth` | | Upstream SHA256 | `d5fd7b0c49ee6a5422ef5d0c884d962c742003bfbd900747485eb99fa269d0db` | | Upstream factory | `uniformer_small()` in `video_classification/models/uniformer.py` | | Conversion script | [`scripts/convert_uniformer_video.py`](https://github.com/CondadosAI/acaua/blob/main/scripts/convert_uniformer_video.py) | | Paper | Li et al., [*UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning*](https://arxiv.org/abs/2201.04676), 2022 | | Params | 22M | | Top-1 (Kinetics-400, 16 frames x 1 clip x 1 crop) | 78.4% | | FLOPs | 41.8G | | Training recipe | 16 input frames, sampling stride 8, 224x224 center-crop, ImageNet-mean/std normalization | | Mirrored on | 2026-04-24 | | Mirrored by | [CondadosAI/acaua](https://github.com/CondadosAI/acaua) | ## Usage via acaua ```python import acaua # MIT-declared weights require the explicit opt-in. model = acaua.Model.from_pretrained( "CondadosAI/uniformer_s_k400", allow_non_apache=True ) result = model.predict("video.mp4") print(result.labels) # tuple of top-5 Kinetics-400 action labels print(result.scores) # aligned float32 probabilities ``` Requires `pip install 'acaua[video]'` for the TorchCodec-backed video decoder and a system-level `ffmpeg` install. ## Files in this mirror - `model.safetensors` — acaua-format weights (key-remapped, verified round-trip under `load_state_dict(strict=True)` at conversion time). - `labels.json` — JSON array of 400 Kinetics-400 action labels in index order. Read by the adapter at load time. - `config.json` — minimal metadata: `acaua_task=video_classification`, `num_frames`, `num_classes`. - `NOTICE` — attribution chain (code AND weights). - `LICENSE` — Apache-2.0. ## License and attribution The adapter code (this repository) is redistributed under Apache-2.0. The underlying weights carry upstream's MIT declaration (compatible, permissively redistributable). The acaua UniFormer-video adapter is itself a derivative work of the upstream PyTorch implementation — see [`NOTICE`](./NOTICE) for the required attribution chain. ## Citation ```bibtex @misc{li2022uniformervideo, title = {UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning}, author = {Li, Kunchang and Wang, Yali and Gao, Peng and Song, Guanglu and Liu, Yu and Li, Hongsheng and Qiao, Yu}, year = {2022}, eprint = {2201.04676}, archivePrefix = {arXiv}, primaryClass = {cs.CV}, } ```