--- license: mit datasets: - mnist metrics: - accuracy --- # Model Card for mnistvit A vision transformer (ViT) trained on MNIST with a PyTorch-only implementation, achieving 99.65% test set accuracy. ## Model Details ### Model Description The model is a vision transformer, as described in the original Dosovitskiy et al., ICLR 2021 paper. - **Developed by:** Arno Onken - **Model type:** Vision Transformer - **License:** MIT ### Model Sources - **Repository:** [https://github.com/asnelt/mnistvit/](https://github.com/asnelt/mnistvit/) - **Python Package Index:** [https://pypi.org/project/mnistvit/](https://pypi.org/project/mnistvit/) - **Paper:** [Dosovitskiy et al., ICLR 2021](https://openreview.net/forum?id=YicbFdNTTy) ## Uses The model is intended to be used for learning about vision transformers. It is small and trained on MNIST as a simple and well understood dataset. Together with the mnistvit package code, the importance of various hyperparameters can be explored. ## How to Get Started with the Model Install the mnistvit package, which provides code for training and running the model: ``` pip install mnistvit ``` Place the `config.json` and `model.pt` file from this repository in a directory of your choice and run Python from that directory. To evaluate the test set accuracy and loss of the model stored in `model.pt` with configuration `config.json`: ``` python -m mnistvit --use-accuracy --use-loss ``` Individual images can be classified as well. To predict the class of a digit image stored in a file `sample.jpg`: ``` python -m mnistvit --image-file sample.jpg ``` ## Training Details ### Training Data This model was trained on the 60,000 training set images of the [MNIST](https://huggingface.co/datasets/ylecun/mnist/) dataset. Data augmentation was used in the form of random rotations, translations and scaling as detailed in the `mnistvit.preprocess` module. ### Training Procedure - **Training regime:** fp32 Hyperparameters were obtained from an 80:20 training set - validation set split of the original MNIST training set, running Ray Tune with Optuna as detailed in the `mnistvit.tune` module. The resulting parameters were then set as default parameters in the `mnistvit.train` module. ## Evaluation ### Testing Data This model was evaluated on the 10,000 test set images of the [MNIST](https://huggingface.co/datasets/ylecun/mnist/) dataset. ### Results Test set accuracy: 99.65% Test set cross entropy loss: 0.011