vit-mushroom-classifier

Fine-tuned google/vit-base-patch16-224 image classifier for 16 mushroom species.

This model was trained for an educational mushroom image classification project. It predicts the most likely species among the 16 trained classes and can be used for candidate ranking, demos, and model comparison. It is not suitable for real-world foraging, edibility, or toxicity decisions.

Model Details

  • Base model: google/vit-base-patch16-224
  • Architecture: ViTForImageClassification
  • Task: single-label image classification
  • Image size: 224 x 224
  • Classes: 16 mushroom species
  • Final model directory: models/vit_mushroom_classifier
  • Final evaluation date: 2026-06-03
  • Transformers version in config: 4.57.6

Classes

Amanita_muscaria
Amanita_phalloides
Armillaria_mellea
Cerioporus_squamosus
Chlorophyllum_brunneum
Clitocybe_nuda
Coprinellus_micaceus
Coprinus_comatus
Flammulina_velutipes
Gliophorus_psittacinus
Hygrophoropsis_aurantiaca
Hypholoma_lateritium
Stereum_hirsutum
Suillus_luteus
Tricholomopsis_rutilans
Tylopilus_felleus

Dataset

The image dataset combines a Kaggle mushroom image dataset with additional GBIF image records for the selected species.

Final image scope:

  • Total images: 11,200
  • Classes: 16
  • Images per class: 700
  • Train images: 7,839
  • Validation images: 1,680
  • Test images: 1,681
  • Split: stratified 70% / 15% / 15%

Training Procedure

The original ImageNet classifier head was replaced with a new 16-class head. Training used two phases:

  1. Train the new classifier head.
  2. Fine-tune the last 4 ViT encoder blocks.

Main configuration:

  • Batch size: 16
  • Head training epochs: 1
  • Fine-tuning epochs: 4
  • Head learning rate: 3e-4
  • Fine-tuning learning rate: 2e-5
  • Weight decay: 0.01
  • Random crop scale: 0.85-1.0
  • Color jitter: 0.1
  • Best model metric: macro F1
  • Early stopping: enabled, patience 2

Validation improved during fine-tuning:

Fine-tune epoch Validation loss Accuracy Macro F1 Top-3 accuracy
1 0.3553 0.8994 0.8986 0.9762
2 0.2701 0.9196 0.9195 0.9804
3 0.2437 0.9250 0.9251 0.9833
4 0.2325 0.9304 0.9303 0.9845

Test Results

Final test set results:

Metric Score
Accuracy 0.9161
Balanced accuracy 0.9161
Macro F1 0.9162
Top-3 accuracy 0.9780

Bootstrap 95% confidence intervals:

Metric Mean 95% CI
Accuracy 0.9165 0.9036-0.9286
Macro F1 0.9162 0.9035-0.9282
Top-3 accuracy 0.9780 0.9714-0.9845

Lowest per-class F1 scores:

Class Precision Recall F1
Flammulina_velutipes 0.8302 0.8381 0.8341
Armillaria_mellea 0.8762 0.8762 0.8762
Suillus_luteus 0.9020 0.8762 0.8889
Gliophorus_psittacinus 0.8649 0.9143 0.8889
Hygrophoropsis_aurantiaca 0.8727 0.9143 0.8930

Common confusion patterns included:

  • Suillus_luteus predicted as Tylopilus_felleus
  • Coprinus_comatus predicted as Chlorophyllum_brunneum
  • Several confusions involving Flammulina_velutipes, Gliophorus_psittacinus, and Hygrophoropsis_aurantiaca

Known Limitations

  • The model only predicts among the 16 trained species.
  • It does not know whether an image belongs to an unseen species.
  • It may make confident mistakes on visually similar species.
  • Some exact duplicate image hashes were found across splits: 55 duplicate rows in 27 hash groups, with 19 groups crossing train/validation/test splits. This may slightly overestimate test performance.
  • The dataset combines images from different sources, so source-specific visual patterns may influence predictions.
  • The model must not be used for mushroom consumption, toxicity, or foraging decisions.

Intended Use

Suitable uses:

  • Educational image classification demos
  • Candidate species ranking among the 16 known classes
  • Comparison against structured ecological/context models
  • Prototype user interfaces for mushroom image classification

Unsuitable uses:

  • Real-world mushroom identification without expert review
  • Edibility or toxicity classification
  • Safety-critical decisions
  • Open-world species recognition beyond the 16 labels

Summary

The final ViT model achieved strong and stable test performance: about 91.6% macro F1 and 97.8% top-3 accuracy on a balanced 16-class test set. The result is promising for educational candidate prediction, but a deduplicated re-split and re-evaluation would be recommended before treating the score as final.

Downloads last month
4
Safetensors
Model size
85.8M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for gaglileoo/vit-mushroom-classifier

Finetuned
(2062)
this model

Space using gaglileoo/vit-mushroom-classifier 1