nickpai
/

vit-layer12-32-cifar10

vision-transformer

computer-vision

machine-learning

Model card Files Files and versions

nickpai commited on Oct 26, 2024

Commit

ffdafeb

·

verified ·

1 Parent(s): c6cc994

Update README.md

Files changed (1) hide show

README.md +63 -3

README.md CHANGED Viewed

@@ -1,3 +1,63 @@
----
-license: mit
----

+---
+license: mit
+tags:
+- vision-transformer
+- ViT
+- classification
+- cifar10
+- computer-vision
+- deep-learning
+- machine-learning
+---
+# ViT-Classification-CIFAR10
+## Model Description
+This model is a Vision Transformer (ViT) architecture trained on the CIFAR-10 dataset for image classification. It is trained from scratch without pre-training on a larger dataset.
+**Metrics:**
+* Test accuracy: 82.04%
+* Test loss: 0.5560
+## Training Configuration
+**Hardware:** NVIDIA RTX 3090
+**Training parameters:**
+* Epochs: 200
+* Batch size: 2048
+* Input size: 3x32x32
+* Patch size: 4
+* Sequence length: 8*8
+* Embed size: 128
+* Num of layers: 12
+* Num of heads: 4
+* Forward multiplier: 2
+* Dropout: 0.1
+* Optimizer: AdamW
+## Intended Uses & Limitations
+This model is intended for practice purposes and exploration of ViT architectures on the CIFAR-10 dataset. It can be used for image classification tasks on similar datasets.
+**Limitations:**
+* This model is trained on a relatively small dataset (CIFAR-10) and might not generalize well to unseen data.
+* Training is done without fine-tuning, potentially limiting its performance compared to a fine-tuned model.
+* Training is performed on a single RTX 3090.
+## Training Data
+The model is trained on the CIFAR-10 dataset, containing 60,000 32x32 color images in 10 classes.
+* Training set: 50,000 images
+* Test set: 10,000 images
+**Data Source:** [https://paperswithcode.com/dataset/cifar-10](https://paperswithcode.com/dataset/cifar-10)
+## Documentation
+* GitHub Repository: [ViT-Classification-CIFAR10](https://github.com/nick8592/ViT-Classification-CIFAR10.git)