| | --- |
| | license: mit |
| | datasets: |
| | - mnist |
| | metrics: |
| | - accuracy |
| | --- |
| | # Model Card for mnistvit |
| |
|
| | A vision transformer (ViT) trained on MNIST with a PyTorch-only implementation, |
| | achieving 99.65% test set accuracy. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | The model is a vision transformer, as described in the original |
| | Dosovitskiy et al., ICLR 2021 paper. |
| |
|
| | - **Developed by:** Arno Onken |
| | - **Model type:** Vision Transformer |
| | - **License:** MIT |
| |
|
| | ### Model Sources |
| |
|
| | - **Repository:** |
| | [https://github.com/asnelt/mnistvit/](https://github.com/asnelt/mnistvit/) |
| | - **Python Package Index:** |
| | [https://pypi.org/project/mnistvit/](https://pypi.org/project/mnistvit/) |
| | - **Paper:** [Dosovitskiy et al., ICLR 2021](https://openreview.net/forum?id=YicbFdNTTy) |
| |
|
| | ## Uses |
| |
|
| | The model is intended to be used for learning about vision transformers. It is small |
| | and trained on MNIST as a simple and well understood dataset. Together with the |
| | mnistvit package code, the importance of various hyperparameters can be explored. |
| |
|
| | ## How to Get Started with the Model |
| |
|
| | Install the mnistvit package, which provides code for training and running the model: |
| |
|
| | ``` |
| | pip install mnistvit |
| | ``` |
| |
|
| | Place the `config.json` and `model.pt` file from this repository in a directory of your |
| | choice and run Python from that directory. |
| |
|
| | To evaluate the test set accuracy and loss of the model stored in `model.pt` with |
| | configuration `config.json`: |
| | ``` |
| | python -m mnistvit --use-accuracy --use-loss |
| | ``` |
| |
|
| | Individual images can be classified as well. To predict the class of a digit image |
| | stored in a file `sample.jpg`: |
| | ``` |
| | python -m mnistvit --image-file sample.jpg |
| | ``` |
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| |
|
| | This model was trained on the 60,000 training set images of the |
| | [MNIST](https://huggingface.co/datasets/ylecun/mnist/) dataset. Data augmentation was |
| | used in the form of random rotations, translations and scaling as detailed in the |
| | `mnistvit.preprocess` module. |
| |
|
| | ### Training Procedure |
| |
|
| | - **Training regime:** fp32 |
| |
|
| | Hyperparameters were obtained from an 80:20 training set - validation set split of the |
| | original MNIST training set, running Ray Tune with Optuna as detailed in the |
| | `mnistvit.tune` module. The resulting parameters were then set as default parameters in |
| | the `mnistvit.train` module. |
| |
|
| | ## Evaluation |
| |
|
| | ### Testing Data |
| |
|
| | This model was evaluated on the 10,000 test set images of the |
| | [MNIST](https://huggingface.co/datasets/ylecun/mnist/) dataset. |
| |
|
| | ### Results |
| |
|
| | Test set accuracy: 99.65% |
| |
|
| | Test set cross entropy loss: 0.011 |
| |
|