Transformers
mnistvit / README.md
asnelt's picture
Convert model to bfloat16 and add GitHub link
106c18e verified
---
license: mit
datasets:
- mnist
metrics:
- accuracy
---
# Model Card for mnistvit
A vision transformer (ViT) trained on MNIST with a PyTorch-only implementation,
achieving 99.65% test set accuracy.
## Model Details
### Model Description
The model is a vision transformer, as described in the original
Dosovitskiy et al., ICLR 2021 paper.
- **Developed by:** Arno Onken
- **Model type:** Vision Transformer
- **License:** MIT
### Model Sources
- **Repository:**
[https://github.com/asnelt/mnistvit/](https://github.com/asnelt/mnistvit/)
- **Python Package Index:**
[https://pypi.org/project/mnistvit/](https://pypi.org/project/mnistvit/)
- **Paper:** [Dosovitskiy et al., ICLR 2021](https://openreview.net/forum?id=YicbFdNTTy)
## Uses
The model is intended to be used for learning about vision transformers. It is small
and trained on MNIST as a simple and well understood dataset. Together with the
mnistvit package code, the importance of various hyperparameters can be explored.
## How to Get Started with the Model
Install the mnistvit package, which provides code for training and running the model:
```
pip install mnistvit
```
Place the `config.json` and `model.pt` file from this repository in a directory of your
choice and run Python from that directory.
To evaluate the test set accuracy and loss of the model stored in `model.pt` with
configuration `config.json`:
```
python -m mnistvit --use-accuracy --use-loss
```
Individual images can be classified as well. To predict the class of a digit image
stored in a file `sample.jpg`:
```
python -m mnistvit --image-file sample.jpg
```
## Training Details
### Training Data
This model was trained on the 60,000 training set images of the
[MNIST](https://huggingface.co/datasets/ylecun/mnist/) dataset. Data augmentation was
used in the form of random rotations, translations and scaling as detailed in the
`mnistvit.preprocess` module.
### Training Procedure
- **Training regime:** fp32
Hyperparameters were obtained from an 80:20 training set - validation set split of the
original MNIST training set, running Ray Tune with Optuna as detailed in the
`mnistvit.tune` module. The resulting parameters were then set as default parameters in
the `mnistvit.train` module.
## Evaluation
### Testing Data
This model was evaluated on the 10,000 test set images of the
[MNIST](https://huggingface.co/datasets/ylecun/mnist/) dataset.
### Results
Test set accuracy: 99.65%
Test set cross entropy loss: 0.011