pvt_v2_b2

Converted TIMM image classification model for LiteRT.

Source architecture: pvt_v2_b2
Source checkpoint: timm/pvt_v2_b2.in1k
File: model.tflite
Input: float32 tensor in NCHW layout, shape [1, 3, 224, 224]
Output: ImageNet-1K logits, shape [1, 1000]

Runtime Status

CPU smoke test: passed with LiteRT CompiledModel.
GPU delegation: currently blocked for this model by rank-5 tensor patterns in the GPU backend, mostly RESHAPE, TRANSPOSE, and related window/attention operations. The model is published as CPU-ready while GPU support is being improved.

Model Details

Model Type: Image classification / feature backbone
Model Stats:
- Params (M): 25.4
- GMACs: 4.0
- Activations (M): 27.5
- Image size: 224 x 224
Papers:
- PVT v2: Improved Baselines with Pyramid Vision Transformer: https://arxiv.org/abs/2106.13797
Dataset: ImageNet-1k
Original: https://github.com/whai362/PVT

Citation

@article{wang2021pvtv2,
  title={Pvtv2: Improved baselines with pyramid vision transformer},
  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},
  journal={Computational Visual Media},
  volume={8},
  number={3},
  pages={1--10},
  year={2022},
  publisher={Springer}
}

Downloads last month: 5

Model tree for litert-community/pvt_v2_b2

Base model

timm/pvt_v2_b2.in1k

Finetuned

(1)

this model

Dataset used to train litert-community/pvt_v2_b2

Collection including litert-community/pvt_v2_b2

Image Classification Models

Collection

LiteRT image-classification models from litert-community. • 89 items • Updated 15 days ago • 6

Paper for litert-community/pvt_v2_b2

PVT v2: Improved Baselines with Pyramid Vision Transformer

Paper • 2106.13797 • Published Jun 25, 2021