Update README.md
Browse files
README.md
CHANGED
|
@@ -3,19 +3,29 @@ library_name: keras
|
|
| 3 |
---
|
| 4 |
|
| 5 |
## Model description
|
| 6 |
-
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
## Intended uses & limitations
|
| 10 |
|
| 11 |
-
|
|
|
|
| 12 |
|
| 13 |
## Training and evaluation data
|
| 14 |
|
| 15 |
-
|
| 16 |
|
| 17 |
## Training procedure
|
| 18 |
|
|
|
|
|
|
|
|
|
|
| 19 |
### Training hyperparameters
|
| 20 |
|
| 21 |
The following hyperparameters were used during training:
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
## Model description
|
| 6 |
+
**This model is implementation of the distillation recipe proposed in DeiT.**
|
| 7 |
+
Visit Keras example on [Distilling Vision Transformers](https://keras.io/examples/vision/deit/).
|
| 8 |
+
|
| 9 |
+
Full credits to: [Sayak Paul](https://twitter.com/RisingSayak)
|
| 10 |
+
|
| 11 |
+
In the original Vision Transformers (ViT) paper (Dosovitskiy et al.), the authors concluded that to perform on par with Convolutional Neural Networks (CNNs), ViTs need to be pre-trained on larger datasets. The larger the better. This is mainly due to the lack of inductive biases in the ViT architecture -- unlike CNNs, they don't have layers that exploit locality.
|
| 12 |
+
|
| 13 |
+
Many groups have proposed different ways to deal with the problem of data-intensiveness of ViT training. One such way was shown in the Data-efficient image Transformers, (DeiT) paper (Touvron et al.). The authors introduced a distillation technique that is specific to transformer-based vision models. DeiT is among the first works to show that it's possible to train ViTs well without using larger datasets.
|
| 14 |
|
| 15 |
## Intended uses & limitations
|
| 16 |
|
| 17 |
+
The model is trained for demonstrative purposes and does not guarantee the best results in production.
|
| 18 |
+
For better results, follow & optimize the [Keras example](https://keras.io/examples/vision/deit/) as per your need.
|
| 19 |
|
| 20 |
## Training and evaluation data
|
| 21 |
|
| 22 |
+
The model is trained and evaluated on [TF Flowers dataset](https://www.tensorflow.org/datasets/catalog/tf_flowers)
|
| 23 |
|
| 24 |
## Training procedure
|
| 25 |
|
| 26 |
+
Training procedure is followed exactly as from the [keras example](https://keras.io/examples/vision/deit/).
|
| 27 |
+
The batch size is however decreased to 16 from the original 256 for accomodating the model in a single V100 GPU memory.
|
| 28 |
+
|
| 29 |
### Training hyperparameters
|
| 30 |
|
| 31 |
The following hyperparameters were used during training:
|