motorcycle-vit-model
A Vision Transformer (ViT) fine-tuned for motorcycle type classification into 4 categories: cruiser, sport, naked, roller.
Model Details
- Base Architecture:
google/vit-base-patch16-224
- Fine-tuning: Transfer learning on custom motorcycle dataset
- Framework: Hugging Face
transformers + PyTorch
- Task: Image Classification (4 classes)
Classes
| Label |
Description |
| cruiser |
Low seat height, forward foot pegs, relaxed riding position |
| sport |
Full fairings, aggressive aerodynamic design |
| naked |
Minimal fairings, exposed engine, upright seating |
| roller |
Scooters with step-through frames and smaller wheels |
Training
- Dataset: ~34 images/class from Kaggle Vietnamese Bike and Motorbike Dataset
- Split: 60% train / 20% validation / 20% test
- Preprocessing: Resize to 224x224, ImageNet normalization
- Optimizer: AdamW, lr=2e-5
- Batch size: 16
- Epochs: 5
Performance
| Metric |
Value |
| Validation Accuracy |
70.59% |
| Validation Loss |
1.116 |
| Class |
Accuracy |
| Cruiser |
72% |
| Sport |
68% |
| Naked |
69% |
| Roller |
73% |
Usage
Demo
Live demo available at: nbacchi/abgabe2_motorbikes