mobilevitv2_125

Converted TIMM image classification model for LiteRT.

Source architecture: mobilevitv2_125
Source checkpoint: timm/mobilevitv2_125.cvnets_in1k
File: model.tflite
Input: float32 tensor in NCHW layout, shape [1, 3, 256, 256]
Output: ImageNet-1K logits, shape [1, 1000]

Runtime Status

CPU smoke test: passed with LiteRT CompiledModel.
GPU delegation: currently blocked for this model by rank-5 tensor patterns in the GPU backend, mostly RESHAPE, TRANSPOSE, and related window/attention operations. The model is published as CPU-ready while GPU support is being improved.

Model Details

Model Type: Image classification / feature backbone
Model Stats:
- Params (M): 7.5
- GMACs: 2.9
- Activations (M): 20.1
- Image size: 256 x 256
Papers:
- Separable Self-attention for Mobile Vision Transformers: https://arxiv.org/abs/2206.02680
Original: https://github.com/apple/ml-cvnets
Dataset: ImageNet-1k

Citation

@article{Mehta2022SeparableSF,
  title={Separable Self-attention for Mobile Vision Transformers},
  author={Sachin Mehta and Mohammad Rastegari},
  journal={ArXiv},
  year={2022},
  volume={abs/2206.02680}
}

Downloads last month: -

Model tree for litert-community/mobilevitv2_125

Base model

timm/mobilevitv2_125.cvnets_in1k

Finetuned

(1)

this model

Dataset used to train litert-community/mobilevitv2_125

Collection including litert-community/mobilevitv2_125

Image Classification Models

Collection

LiteRT image-classification models from litert-community. • 74 items • Updated about 14 hours ago • 3

Paper for litert-community/mobilevitv2_125

Separable Self-attention for Mobile Vision Transformers

Paper • 2206.02680 • Published Jun 6, 2022