--- language: en tags: - image-classification - vision-transformer - transfer-learning - motorcycles datasets: - imagefolder metrics: - accuracy --- # motorcycle-vit-model A Vision Transformer (ViT) fine-tuned for motorcycle type classification into 4 categories: **cruiser**, **sport**, **naked**, **roller**. ## Model Details - **Base Architecture**: `google/vit-base-patch16-224` - **Fine-tuning**: Transfer learning on custom motorcycle dataset - **Framework**: Hugging Face `transformers` + PyTorch - **Task**: Image Classification (4 classes) ## Classes | Label | Description | |---|---| | cruiser | Low seat height, forward foot pegs, relaxed riding position | | sport | Full fairings, aggressive aerodynamic design | | naked | Minimal fairings, exposed engine, upright seating | | roller | Scooters with step-through frames and smaller wheels | ## Training - **Dataset**: ~34 images/class from [Kaggle Vietnamese Bike and Motorbike Dataset](https://www.kaggle.com/datasets/nqa112/vietnamese-bike-and-motorbike) - **Split**: 60% train / 20% validation / 20% test - **Preprocessing**: Resize to 224x224, ImageNet normalization - **Optimizer**: AdamW, lr=2e-5 - **Batch size**: 16 - **Epochs**: 5 ## Performance | Metric | Value | |---|---| | Validation Accuracy | 70.59% | | Validation Loss | 1.116 | | Class | Accuracy | |---|---| | Cruiser | 72% | | Sport | 68% | | Naked | 69% | | Roller | 73% | ## Usage ## Demo Live demo available at: [nbacchi/abgabe2_motorbikes](https://huggingface.co/spaces/nbacchi/abgabe2_motorbikes)