ILSVRC/imagenet-1k
Viewer • Updated • 1.43M • 76.8k • 818
Dosovitskiy et al., 2021 — An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (arXiv:2010.11929)
Lucid port of torchvision/ViT_B_32_Weights.IMAGENET1K_V1,
converted to Lucid-native safetensors.
| Tag | acc@1 | acc@5 | Params | GFLOPs | Size | Source |
|---|---|---|---|---|---|---|
IMAGENET1K_V1 (default) |
75.912 | 92.466 | 88.2M | 4.409 | 336.56 MB | torchvision |
import lucid.models as models
from lucid.models.weights import ViTBase32Weights
# default tag
model = models.vit_base_32_cls(pretrained=True)
# explicit tag (enum or string)
model = models.vit_base_32_cls(weights=ViTBase32Weights.IMAGENET1K_V1)
model = models.vit_base_32_cls(pretrained="IMAGENET1K_V1")
# preprocessing travels with the weights
weights = ViTBase32Weights.IMAGENET1K_V1
preprocess = weights.transforms()
logits = model(preprocess(image)[None]).logits
Converted from torchvision/ViT_B_32_Weights.IMAGENET1K_V1 via
python -m tools.convert_weights vit_base_32 --tag IMAGENET1K_V1.
Key mapping + numerical parity verified against the source.
bsd-3-clause — inherited from the original weights.
@inproceedings{dosovitskiy2021image,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
booktitle={ICLR}, year={2021}
}