| --- |
| license: mit |
| library_name: acaua |
| pipeline_tag: video-classification |
| tags: |
| - video-classification |
| - vision |
| - acaua |
| - native-pytorch-port |
| datasets: |
| - kinetics-400 |
| --- |
| |
| # UniFormer-S (Kinetics-400) β acaua mirror (pure-PyTorch port) |
|
|
| Pure-PyTorch port of **UniFormer-S** (video classification, trained on |
| Kinetics-400 with 16-frame clips at sampling stride 8) hosted under |
| `CondadosAI/` for use with the [acaua](https://github.com/CondadosAI/acaua) |
| computer vision library. |
|
|
| The architecture has been re-implemented in pure PyTorch under |
| `acaua.adapters.uniformer.video` β no `mmcv`, no `mmengine`, no |
| `mmaction2`, no `trust_remote_code`, no `timm` runtime dependency. |
| The weights are converted from the upstream `.pth` checkpoint to |
| safetensors with acaua's state-dict key naming (`backbone.*` + |
| `head.fc.*`). They are **not** drop-in compatible with timm or |
| Sense-X/UniFormer loaders β they are designed to load cleanly into |
| acaua's `nn.Module` tree under `load_state_dict(strict=True)`. |
|
|
| ## Provenance |
|
|
| | | | |
| |---|---| |
| | Upstream code | [`Sense-X/UniFormer`](https://github.com/Sense-X/UniFormer) @ `main` (Apache-2.0) | |
| | Upstream weights | [`Sense-X/uniformer_video`](https://huggingface.co/Sense-X/uniformer_video) at revision `f9448914e6161573b14ba47b72fcef170e03a1f9` (MIT) | |
| | Upstream file | `uniformer_small_k400_16x8.pth` | |
| | Upstream SHA256 | `d5fd7b0c49ee6a5422ef5d0c884d962c742003bfbd900747485eb99fa269d0db` | |
| | Upstream factory | `uniformer_small()` in `video_classification/models/uniformer.py` | |
| | Conversion script | [`scripts/convert_uniformer_video.py`](https://github.com/CondadosAI/acaua/blob/main/scripts/convert_uniformer_video.py) | |
| | Paper | Li et al., [*UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning*](https://arxiv.org/abs/2201.04676), 2022 | |
| | Params | 22M | |
| | Top-1 (Kinetics-400, 16 frames x 1 clip x 1 crop) | 78.4% | |
| | FLOPs | 41.8G | |
| | Training recipe | 16 input frames, sampling stride 8, 224x224 center-crop, ImageNet-mean/std normalization | |
| | Mirrored on | 2026-04-24 | |
| | Mirrored by | [CondadosAI/acaua](https://github.com/CondadosAI/acaua) | |
|
|
| ## Usage via acaua |
|
|
| ```python |
| import acaua |
| |
| # MIT-declared weights require the explicit opt-in. |
| model = acaua.Model.from_pretrained( |
| "CondadosAI/uniformer_s_k400", allow_non_apache=True |
| ) |
| result = model.predict("video.mp4") |
| print(result.labels) # tuple of top-5 Kinetics-400 action labels |
| print(result.scores) # aligned float32 probabilities |
| ``` |
|
|
| Requires `pip install 'acaua[video]'` for the TorchCodec-backed video |
| decoder and a system-level `ffmpeg` install. |
|
|
| ## Files in this mirror |
|
|
| - `model.safetensors` β acaua-format weights (key-remapped, verified |
| round-trip under `load_state_dict(strict=True)` at conversion time). |
| - `labels.json` β JSON array of 400 Kinetics-400 action labels in |
| index order. Read by the adapter at load time. |
| - `config.json` β minimal metadata: `acaua_task=video_classification`, |
| `num_frames`, `num_classes`. |
| - `NOTICE` β attribution chain (code AND weights). |
| - `LICENSE` β Apache-2.0. |
|
|
| ## License and attribution |
|
|
| The adapter code (this repository) is redistributed under Apache-2.0. |
| The underlying weights carry upstream's MIT declaration (compatible, |
| permissively redistributable). The acaua UniFormer-video adapter is |
| itself a derivative work of the upstream PyTorch implementation β see |
| [`NOTICE`](./NOTICE) for the required attribution chain. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{li2022uniformervideo, |
| title = {UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning}, |
| author = {Li, Kunchang and Wang, Yali and Gao, Peng and Song, Guanglu and Liu, Yu and Li, Hongsheng and Qiao, Yu}, |
| year = {2022}, |
| eprint = {2201.04676}, |
| archivePrefix = {arXiv}, |
| primaryClass = {cs.CV}, |
| } |
| ``` |
|
|