ChanLumerico commited on
Commit
520f50f
·
verified ·
1 Parent(s): fa972c6

Update model card

Browse files
Files changed (1) hide show
  1. README.md +71 -0
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: lucid
3
+ license: bsd-3-clause
4
+ tags:
5
+ - image-classification
6
+ - vit
7
+ - lucid
8
+ datasets:
9
+ - imagenet-1k
10
+ pipeline_tag: image-classification
11
+ model-index:
12
+ - name: vit-base-16
13
+ results:
14
+ - task: { type: image-classification }
15
+ dataset: { name: ImageNet-1K, type: imagenet-1k }
16
+ metrics:
17
+ - { type: acc@1, value: 81.072 }
18
+ - { type: acc@5, value: 95.318 }
19
+ ---
20
+
21
+ # ViT-Base/16
22
+
23
+ > Dosovitskiy et al., 2021 — *An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale* (arXiv:2010.11929)
24
+
25
+ [Lucid](https://github.com/ChanLumerico/lucid) port of `torchvision/ViT_B_16_Weights.IMAGENET1K_V1`,
26
+ converted to Lucid-native safetensors.
27
+
28
+ ## Available weights
29
+
30
+ | Tag | acc@1 | acc@5 | Params | GFLOPs | Size | Source |
31
+ |---|---|---|---|---|---|---|
32
+ | `IMAGENET1K_V1` *(default)* | 81.072 | 95.318 | 86.6M | 17.564 | 330.24 MB | torchvision |
33
+
34
+ ## Usage
35
+
36
+ ```python
37
+ import lucid.models as models
38
+ from lucid.models.vision.resnet import VitBase16Weights
39
+
40
+ # default tag
41
+ model = models.vit_base_16_cls(pretrained=True)
42
+
43
+ # explicit tag (enum or string)
44
+ model = models.vit_base_16_cls(weights=VitBase16Weights.IMAGENET1K_V1)
45
+ model = models.vit_base_16_cls(pretrained="IMAGENET1K_V1")
46
+
47
+ # preprocessing travels with the weights
48
+ weights = VitBase16Weights.IMAGENET1K_V1
49
+ preprocess = weights.transforms()
50
+ logits = model(preprocess(image)[None]).logits
51
+ ```
52
+
53
+ ## Conversion
54
+
55
+ Converted from `torchvision/ViT_B_16_Weights.IMAGENET1K_V1` via
56
+ `python -m tools.convert_weights vit_base_16 --tag IMAGENET1K_V1`.
57
+ Key mapping + numerical parity verified against the source.
58
+
59
+ ## License
60
+
61
+ `bsd-3-clause` — inherited from the original weights.
62
+
63
+ ## Citation
64
+
65
+ ```
66
+ @inproceedings{dosovitskiy2021image,
67
+ title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
68
+ author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
69
+ booktitle={ICLR}, year={2021}
70
+ }
71
+ ```