billpsomas
/

vits_supervised_official_ep100

Image Classification

vision transformer

computer vision

Model card Files Files and versions

billpsomas commited on Dec 1, 2023

Commit

6bb7d0d

·

1 Parent(s): 4e7652d

Update README.md

Files changed (1) hide show

README.md +32 -0

README.md CHANGED Viewed

@@ -1,3 +1,35 @@
 ---
 license: cc-by-4.0
 ---

 ---
 license: cc-by-4.0
+datasets:
+- imagenet-1k
+metrics:
+- accuracy
+pipeline_tag: image-classification
+language:
+- en
+tags:
+- vision transformer
+- simpool
+- computer vision
+- deep learning
 ---
+# Supervised ViT-S/16 (small-sized Vision Transformer with patch size 16) model
+ViT-S official model trained on ImageNet-1k for 100 epochs. Reproduced for ICCV 2023 [SimPool](https://arxiv.org/abs/2309.06891) paper.
+SimPool is a simple attention-based pooling method at the end of network, released in this [repository](https://github.com/billpsomas/simpool/).
+Disclaimer: This model card is written by the author of SimPool, i.e. [Bill Psomas](http://users.ntua.gr/psomasbill/).
+## BibTeX entry and citation info
+```
+@misc{psomas2023simpool,
+      title={Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit?},
+      author={Bill Psomas and Ioannis Kakogeorgiou and Konstantinos Karantzalos and Yannis Avrithis},
+      year={2023},
+      eprint={2309.06891},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+```