---
license: mit
datasets:
- mnist
metrics:
- accuracy
---
# Model Card for mnistvit

A vision transformer (ViT) trained on MNIST with a PyTorch-only implementation,
achieving 99.65% test set accuracy.

## Model Details

### Model Description

The model is a vision transformer, as described in the original
Dosovitskiy et al., ICLR 2021 paper.

- **Developed by:** Arno Onken
- **Model type:** Vision Transformer
- **License:** MIT

### Model Sources

- **Repository:**
  [https://github.com/asnelt/mnistvit/](https://github.com/asnelt/mnistvit/)
- **Python Package Index:**
  [https://pypi.org/project/mnistvit/](https://pypi.org/project/mnistvit/)
- **Paper:** [Dosovitskiy et al., ICLR 2021](https://openreview.net/forum?id=YicbFdNTTy)

## Uses

The model is intended to be used for learning about vision transformers.  It is small
and trained on MNIST as a simple and well understood dataset.  Together with the
mnistvit package code, the importance of various hyperparameters can be explored.

## How to Get Started with the Model

Install the mnistvit package, which provides code for training and running the model:

```
pip install mnistvit
```

Place the `config.json` and `model.pt` file from this repository in a directory of your
choice and run Python from that directory.

To evaluate the test set accuracy and loss of the model stored in `model.pt` with
configuration `config.json`:
```
python -m mnistvit --use-accuracy --use-loss
```

Individual images can be classified as well.  To predict the class of a digit image
stored in a file `sample.jpg`:
```
python -m mnistvit --image-file sample.jpg
```

## Training Details

### Training Data

This model was trained on the 60,000 training set images of the
[MNIST](https://huggingface.co/datasets/ylecun/mnist/) dataset.  Data augmentation was
used in the form of random rotations, translations and scaling as detailed in the
`mnistvit.preprocess` module.

### Training Procedure

- **Training regime:** fp32

Hyperparameters were obtained from an 80:20 training set - validation set split of the
original MNIST training set, running Ray Tune with Optuna as detailed in the
`mnistvit.tune` module.  The resulting parameters were then set as default parameters in
the `mnistvit.train` module.

## Evaluation

### Testing Data

This model was evaluated on the 10,000 test set images of the
[MNIST](https://huggingface.co/datasets/ylecun/mnist/) dataset.

### Results

Test set accuracy: 99.65%

Test set cross entropy loss: 0.011