Model Card for Fruit Classification ViT Model

Model Details

Model Description

This model is a Vision Transformer (ViT) fine-tuned for fruit image classification. It classifies images into six categories:

  • Banana
  • Mango
  • Orange
  • Pitaya
  • Pomegranate
  • Tomatoes

The model is based on transfer learning using a pretrained ViT architecture and has been fine-tuned on a subset of the Fruit Recognition dataset.

  • Developed by: Mario Soler Vidal
  • Model type: Vision Transformer (ViT) for image classification
  • Language(s): Not applicable (Computer Vision)
  • License: MIT (or leave blank if unsure)
  • Finetuned from model: google/vit-base-patch16-224

Model Sources


Uses

Direct Use

This model can be used to classify images of fruits into one of the six supported categories. It is suitable for:

  • Educational demonstrations
  • Image classification tasks
  • Computer vision experiments

Downstream Use

The model can be integrated into applications such as:

  • Web apps (e.g., Gradio, Streamlit)
  • Retail or inventory systems
  • Automated fruit recognition pipelines

Out-of-Scope Use

  • Classification of fruits outside the six trained classes
  • Complex real-world environments with heavy occlusion
  • Non-fruit image classification

Bias, Risks, and Limitations

The model was trained on images with mostly clean and controlled backgrounds, which may introduce bias.

Potential limitations:

  • Reduced performance in highly cluttered or noisy environments
  • Limited generalization to unseen fruit types
  • Sensitivity to extreme lighting or image distortions

Recommendations

  • Use the model primarily for the six trained fruit classes
  • Validate performance in real-world scenarios before deployment
  • Consider further fine-tuning for more diverse datasets

Training Details

Training Data

The model was trained on a subset of the Fruit Recognition dataset, which contains over 44,000 images.

Key characteristics:

  • Images captured in a controlled lab environment
  • Resolution: 320 ร— 258 pixels
  • Mostly clean backgrounds
  • Variations in lighting, shadows, and pose

Training Procedure

Preprocessing

  • Images resized to 224 ร— 224 pixels
  • Normalized using ImageNet statistics
  • Converted to RGB format

Data augmentation:

  • Random horizontal flip
  • Rotation
  • Color jitter

Training Hyperparameters

  • Training regime: fp32
  • Approach: Transfer learning (frozen backbone, trained classification head)

Evaluation

Testing Data, Factors & Metrics

Testing Data

Validation and test splits from the Fruit Recognition dataset.

Metrics

  • Accuracy
  • Macro F1 Score

Results

  • Accuracy: ~0.999
  • Macro F1 Score: ~0.999

Summary

The model achieves very high performance with confident predictions across all classes.


Technical Specifications

Model Architecture and Objective

  • Vision Transformer (ViT)
  • Image classification objective

Compute Infrastructure

Hardware

GPU (training environment)

Software

  • Python
  • Hugging Face Transformers
  • PyTorch

Model Card Authors

Mario Soler Vidal

Downloads last month
5
Safetensors
Model size
85.8M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using Mariosolerzhawhugging/fruit-vit-model 1