| --- |
| library_name: lucid |
| license: apache-2.0 |
| tags: |
| - image-classification |
| - maxvit |
| - lucid |
| datasets: |
| - imagenet-1k |
| pipeline_tag: image-classification |
| model-index: |
| - name: maxvit-base |
| results: |
| - task: { type: image-classification } |
| dataset: { name: ImageNet-1k, type: imagenet-1k } |
| metrics: |
| - { type: acc@1, value: 84.95 } |
| - { type: acc@5, value: 97.04 } |
| --- |
| |
| # MaxViT-Base |
|
|
| > Tu et al., 2022 — *MaxViT: Multi-Axis Vision Transformer* (arXiv:2204.01697) |
|
|
| [Lucid](https://github.com/ChanLumerico/lucid) port of `timm/maxvit_base_tf_224.in1k`, |
| converted to Lucid-native safetensors. |
|
|
| ## Available weights |
|
|
| | Tag | acc@1 | acc@5 | Params | GFLOPs | Size | Source | |
| |---|---|---|---|---|---|---| |
| | `IN1K` *(default)* | 84.95 | 97.04 | 119.5M | — | 456.43 MB | timm | |
|
|
| ## Usage |
|
|
| ```python |
| import lucid.models as models |
| from lucid.models.weights import MaxViTBaseWeights |
| |
| # default tag |
| model = models.maxvit_base_cls(pretrained=True) |
| |
| # explicit tag (enum or string) |
| model = models.maxvit_base_cls(weights=MaxViTBaseWeights.IN1K) |
| model = models.maxvit_base_cls(pretrained="IN1K") |
| |
| # preprocessing travels with the weights |
| weights = MaxViTBaseWeights.IN1K |
| preprocess = weights.transforms() |
| logits = model(preprocess(image)[None]).logits |
| ``` |
|
|
| ## Conversion |
|
|
| Converted from `timm/maxvit_base_tf_224.in1k` via |
| `python -m tools.convert_weights maxvit_base --tag IN1K`. |
| Key mapping + numerical parity verified against the source. |
|
|
| ## License |
|
|
| `apache-2.0` — inherited from the original weights. |
|
|
| ## Citation |
|
|
| ``` |
| @inproceedings{tu2022maxvit, |
| title={MaxViT: Multi-Axis Vision Transformer}, |
| author={Tu, Zhengzhong and Talebi, Hossein and Zhang, Han and Yang, Feng and Milanfar, Peyman and Bovik, Alan and Li, Yinxiao}, |
| booktitle={ECCV}, year={2022} |
| } |
| ``` |
|
|