lucid-dl
/

vit-base-16

Image Classification

Eval Results (legacy)

Model card Files Files and versions

ChanLumerico commited on 7 days ago

Commit

520f50f

·

verified ·

1 Parent(s): fa972c6

Update model card

Files changed (1) hide show

README.md +71 -0

README.md ADDED Viewed

	@@ -0,0 +1,71 @@

+---
+library_name: lucid
+license: bsd-3-clause
+tags:
+  - image-classification
+  - vit
+  - lucid
+datasets:
+  - imagenet-1k
+pipeline_tag: image-classification
+model-index:
+  - name: vit-base-16
+    results:
+      - task: { type: image-classification }
+        dataset: { name: ImageNet-1K, type: imagenet-1k }
+        metrics:
+          - { type: acc@1, value: 81.072 }
+          - { type: acc@5, value: 95.318 }
+---
+# ViT-Base/16
+> Dosovitskiy et al., 2021 — *An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale* (arXiv:2010.11929)
+[Lucid](https://github.com/ChanLumerico/lucid) port of `torchvision/ViT_B_16_Weights.IMAGENET1K_V1`,
+converted to Lucid-native safetensors.
+## Available weights
+| Tag | acc@1 | acc@5 | Params | GFLOPs | Size | Source |
+|---|---|---|---|---|---|---|
+| `IMAGENET1K_V1` *(default)* | 81.072 | 95.318 | 86.6M | 17.564 | 330.24 MB | torchvision |
+## Usage
+```python
+import lucid.models as models
+from lucid.models.vision.resnet import VitBase16Weights
+# default tag
+model = models.vit_base_16_cls(pretrained=True)
+# explicit tag (enum or string)
+model = models.vit_base_16_cls(weights=VitBase16Weights.IMAGENET1K_V1)
+model = models.vit_base_16_cls(pretrained="IMAGENET1K_V1")
+# preprocessing travels with the weights
+weights = VitBase16Weights.IMAGENET1K_V1
+preprocess = weights.transforms()
+logits = model(preprocess(image)[None]).logits
+```
+## Conversion
+Converted from `torchvision/ViT_B_16_Weights.IMAGENET1K_V1` via
+`python -m tools.convert_weights vit_base_16 --tag IMAGENET1K_V1`.
+Key mapping + numerical parity verified against the source.
+## License
+`bsd-3-clause` — inherited from the original weights.
+## Citation
+```
+@inproceedings{dosovitskiy2021image,
+  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
+  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
+  booktitle={ICLR}, year={2021}
+}
+```