| --- |
| library_name: lucid |
| license: apache-2.0 |
| tags: |
| - image-classification |
| - cvt |
| - lucid |
| datasets: |
| - imagenet-1k |
| pipeline_tag: image-classification |
| model-index: |
| - name: cvt-21 |
| results: |
| - task: { type: image-classification } |
| dataset: { name: ImageNet-1k, type: imagenet-1k } |
| metrics: |
| - { type: acc@1, value: 82.5 } |
| --- |
| |
| # CvT-21 |
|
|
| > Wu et al., 2021 — *CvT: Introducing Convolutions to Vision Transformers* (arXiv:2103.15808) |
|
|
| [Lucid](https://github.com/ChanLumerico/lucid) port of `transformers/microsoft/cvt-21`, |
| converted to Lucid-native safetensors. |
|
|
| ## Available weights |
|
|
| | Tag | acc@1 | acc@5 | Params | GFLOPs | Size | Source | |
| |---|---|---|---|---|---|---| |
| | `IN1K` *(default)* | 82.5 | — | 31.6M | — | 120.87 MB | transformers | |
|
|
| ## Usage |
|
|
| ```python |
| import lucid.models as models |
| from lucid.models.weights import CvT21Weights |
| |
| # default tag |
| model = models.cvt_21_cls(pretrained=True) |
| |
| # explicit tag (enum or string) |
| model = models.cvt_21_cls(weights=CvT21Weights.IN1K) |
| model = models.cvt_21_cls(pretrained="IN1K") |
| |
| # preprocessing travels with the weights |
| weights = CvT21Weights.IN1K |
| preprocess = weights.transforms() |
| logits = model(preprocess(image)[None]).logits |
| ``` |
|
|
| ## Conversion |
|
|
| Converted from `transformers/microsoft/cvt-21` via |
| `python -m tools.convert_weights cvt_21 --tag IN1K`. |
| Key mapping + numerical parity verified against the source. |
|
|
| ## License |
|
|
| `apache-2.0` — inherited from the original weights. |
|
|
| ## Citation |
|
|
| ``` |
| @inproceedings{wu2021cvt, |
| title={CvT: Introducing Convolutions to Vision Transformers}, |
| author={Wu, Haiping and Xiao, Bin and Codella, Noel and Liu, Mengchen and Dai, Xiyang and Yuan, Lu and Zhang, Lei}, |
| booktitle={ICCV}, year={2021} |
| } |
| ``` |
|
|