asnelt
/

mnistvit

Model card Files Files and versions

mnistvit / README.md

asnelt's picture

Convert model to bfloat16 and add GitHub link

106c18e verified 3 months ago

|

history blame contribute delete

2.49 kB

	---
	license: mit
	datasets:
	- mnist
	metrics:
	- accuracy
	---
	# Model Card for mnistvit

	A vision transformer (ViT) trained on MNIST with a PyTorch-only implementation,
	achieving 99.65% test set accuracy.

	## Model Details

	### Model Description

	The model is a vision transformer, as described in the original
	Dosovitskiy et al., ICLR 2021 paper.

	- Developed by: Arno Onken
	- Model type: Vision Transformer
	- License: MIT

	### Model Sources

	- Repository:
	[https://github.com/asnelt/mnistvit/](https://github.com/asnelt/mnistvit/)
	- Python Package Index:
	[https://pypi.org/project/mnistvit/](https://pypi.org/project/mnistvit/)
	- Paper: [Dosovitskiy et al., ICLR 2021](https://openreview.net/forum?id=YicbFdNTTy)

	## Uses

	The model is intended to be used for learning about vision transformers. It is small
	and trained on MNIST as a simple and well understood dataset. Together with the
	mnistvit package code, the importance of various hyperparameters can be explored.

	## How to Get Started with the Model

	Install the mnistvit package, which provides code for training and running the model:

	```
	pip install mnistvit
	```

	Place the `config.json` and `model.pt` file from this repository in a directory of your
	choice and run Python from that directory.

	To evaluate the test set accuracy and loss of the model stored in `model.pt` with
	configuration `config.json`:
	```
	python -m mnistvit --use-accuracy --use-loss
	```

	Individual images can be classified as well. To predict the class of a digit image
	stored in a file `sample.jpg`:
	```
	python -m mnistvit --image-file sample.jpg
	```

	## Training Details

	### Training Data

	This model was trained on the 60,000 training set images of the
	[MNIST](https://huggingface.co/datasets/ylecun/mnist/) dataset. Data augmentation was
	used in the form of random rotations, translations and scaling as detailed in the
	`mnistvit.preprocess` module.

	### Training Procedure

	- Training regime: fp32

	Hyperparameters were obtained from an 80:20 training set - validation set split of the
	original MNIST training set, running Ray Tune with Optuna as detailed in the
	`mnistvit.tune` module. The resulting parameters were then set as default parameters in
	the `mnistvit.train` module.

	## Evaluation

	### Testing Data

	This model was evaluated on the 10,000 test set images of the
	[MNIST](https://huggingface.co/datasets/ylecun/mnist/) dataset.

	### Results

	Test set accuracy: 99.65%

	Test set cross entropy loss: 0.011