| language: en | |
| tags: | |
| - image-classification | |
| - vision-transformer | |
| - transfer-learning | |
| - motorcycles | |
| datasets: | |
| - imagefolder | |
| metrics: | |
| - accuracy | |
| # motorcycle-vit-model | |
| A Vision Transformer (ViT) fine-tuned for motorcycle type classification into 4 categories: **cruiser**, **sport**, **naked**, **roller**. | |
| ## Model Details | |
| - **Base Architecture**: `google/vit-base-patch16-224` | |
| - **Fine-tuning**: Transfer learning on custom motorcycle dataset | |
| - **Framework**: Hugging Face `transformers` + PyTorch | |
| - **Task**: Image Classification (4 classes) | |
| ## Classes | |
| | Label | Description | | |
| |---|---| | |
| | cruiser | Low seat height, forward foot pegs, relaxed riding position | | |
| | sport | Full fairings, aggressive aerodynamic design | | |
| | naked | Minimal fairings, exposed engine, upright seating | | |
| | roller | Scooters with step-through frames and smaller wheels | | |
| ## Training | |
| - **Dataset**: ~34 images/class from [Kaggle Vietnamese Bike and Motorbike Dataset](https://www.kaggle.com/datasets/nqa112/vietnamese-bike-and-motorbike) | |
| - **Split**: 60% train / 20% validation / 20% test | |
| - **Preprocessing**: Resize to 224x224, ImageNet normalization | |
| - **Optimizer**: AdamW, lr=2e-5 | |
| - **Batch size**: 16 | |
| - **Epochs**: 5 | |
| ## Performance | |
| | Metric | Value | | |
| |---|---| | |
| | Validation Accuracy | 70.59% | | |
| | Validation Loss | 1.116 | | |
| | Class | Accuracy | | |
| |---|---| | |
| | Cruiser | 72% | | |
| | Sport | 68% | | |
| | Naked | 69% | | |
| | Roller | 73% | | |
| ## Usage | |
| ## Demo | |
| Live demo available at: [nbacchi/abgabe2_motorbikes](https://huggingface.co/spaces/nbacchi/abgabe2_motorbikes) | |