Project 2 - Bean Disease Classification and Model Comparison
This project fulfills the assignment requirements by providing:
- A transfer learning pipeline trained on the Beans dataset from Hugging Face.
- A Gradio web app for image upload and prediction display.
- A three-model comparison:
- Custom transfer learning model (your own ViT model)
- Open-source model (CLIP zero-shot)
- Closed-source model (OpenAI Vision)
- Example bean images in the web app.
- Reproducible evaluation output as CSV.
Project Structure
Projekt 2/
app.py
evaluate_models.py
labels.py
model_comparison.py
training_custom_transfer_learning.ipynb
requirements_runtime.txt
requirements_training.txt
.env.example
labels.txt
models/
results/
Dataset Description
This project classifies bean leaf diseases into three distinct categories using the Beans Dataset from Hugging Face:
- Angular Leaf Spot: A fungal disease causing angular lesions on bean leaves
- Bean Rust: A rust disease characterized by reddish-brown pustules
- Healthy: Uninfected, healthy bean leaves
Dataset statistics:
- Total images: ~1,400 training + 200 validation + 200 test
- Classes: 3 disease types
- Source: Hugging Face Datasets - beans
- Resolution: Various sizes (automatically resized to 224×224)
- Splits: Automatically loaded from HF with train/validation/test separate splits
Preprocessing
The training notebook applies the following transformations:
Image Loading & Conversion
- Load directly from Hugging Face Datasets
- Convert all images to RGB
- Verify dimensions are valid for ViT
ViT Image Processor
- Resize to 224×224 (ViT-Base standard)
- Normalize using ImageNet statistics: mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
- Convert to torch tensor
Data Augmentation (Training only)
- Random horizontal flips (50%)
- Random crops to 224×224
- Color jittering (brightness, contrast, saturation)
Label Encoding
- Automatic from HF dataset:
angular_leaf_spot=0,bean_rust=1,healthy=2 - Matching
labels.txt
- Automatic from HF dataset:
Model Architecture & Training
1. Custom Transfer Learning Model
Base Model: google/vit-base-patch16-224
- Vision Transformer, pre-trained on ImageNet-21k + fine-tuned on ImageNet-1k
- 12 transformer blocks, 768-dim embeddings, ~86M parameters
- Input: 224×224 RGB images
Fine-tuning Strategy:
- Replace classification head: 1000 → 3 classes
- Optimizer: AdamW with learning rate 2e-5
- Batch size: 16 (train & eval)
- Epochs: 5 with early stopping
Expected Performance:
- Training accuracy: ~93-95%
- Validation accuracy: ~88-92%
- Test accuracy: ~85-90% (depending on dataset quality)
2. Open-Source Model: CLIP
Model: openai/clip-vit-large-patch14
- Zero-shot image classification (no fine-tuning)
- Learns text-image alignment during pre-training
- Class names: "angular_leaf_spot", "bean_rust", "healthy"
- Robust to domain variations
3. Closed-Source Model: OpenAI Vision
Model: gpt-4-vision or gpt-4-mini
- Multimodal reasoning combining vision and language
- Provides reasoning for predictions (e.g., "Brown pustules indicate rust disease")
- Requires valid
OPENAI_API_KEY - Excellent for disease pattern recognition
Model Comparison Example
Input: Healthy Bean Leaf Image
| Model | Prediction | Confidence | Notes |
|---|---|---|---|
| Custom ViT | healthy | 0.96 | Strong clean leaf detection |
| CLIP | healthy | 0.89 | Text-image alignment |
| OpenAI | healthy | 0.94 | Reasoning: "No visible lesions or pustules" |
Input: Bean Rust Image
| Model | Prediction | Confidence | Notes |
|---|---|---|---|
| Custom ViT | bean_rust | 0.92 | Clear pustule detection |
| CLIP | bean_rust | 0.87 | Disease pattern recognition |
| OpenAI | bean_rust | 0.95 | Reasoning: "Reddish-brown pustules characteristic of rust" |
Evaluation Output
Run:
python evaluate_models.py --examples-dir example_images --output results/model_comparison_results.csv
Generates CSV table with predictions from all three models for each test image.
Training Workflow
Load dataset from Hugging Face (automatic in notebook)
from datasets import load_dataset dataset = load_dataset("beans")Run the training notebook
training_custom_transfer_learning.ipynb- Loads the beans dataset
- Fine-tunes ViT-Base on the training split
- Evaluates on validation split
- Saves trained model to
models/custom-vit-model/
Test locally with the Gradio app
python app.py- Upload bean leaf images
- Get predictions from all three models
- Compare results
Evaluate and generate CSV
python evaluate_models.py- Compares all three models on test set
- Generates
results/model_comparison_results.csv
Running the Project
Quickstart
Install dependencies
pip install -r requirements_runtime.txtSet up environment (optional for OpenAI API)
cp .env.example .env # Edit .env and add your OPENAI_API_KEYRun training (first time only)
pip install -r requirements_training.txt jupyter notebook training_custom_transfer_learning.ipynbLaunch web app
python app.py- Open browser to
http://localhost:7860 - Upload bean leaf images to get predictions
- Open browser to
Advanced: Full Reproducibility
# Run full evaluation
python evaluate_models.py --examples-dir example_images --output results/evaluation.csv
# Export model to Hugging Face Hub
# (see training notebook for instructions)
File Descriptions
- app.py: Gradio interface for interactive predictions from all three models
- model_comparison.py: Core comparison logic, handles CLIP, OpenAI Vision, and custom model inference
- labels.py: Utility for loading and managing class labels
- evaluate_models.py: Script to generate evaluation CSV comparing all models
- training_custom_transfer_learning.ipynb: Complete training pipeline
- requirements_runtime.txt: Dependencies for running the app and inference
- requirements_training.txt: Additional dependencies for training
- .env.example: Template for environment variables (OpenAI API key)
- labels.txt: Bean disease class names (one per line)
API Keys & Environment
To use the OpenAI Vision model, you need a valid OpenAI API key:
Create .env file
cp .env.example .envAdd your OpenAI API key
OPENAI_API_KEY=sk-...Verify in app
- The app gracefully handles missing API key
- Will keep OpenAI predictions as "Not available" if key is missing
- CLIP and custom model always work without API key
Performance Considerations
- Custom ViT: Fastest inference, requires GPU for training but CPU acceptable for inference
- CLIP: Very fast zero-shot inference, no domain-specific training needed
- OpenAI Vision: Slowest (API call), but most robust and provides reasoning
Deployment to Hugging Face Space
Upload model to HF Hub
# In training notebook, after training: trainer.push_to_hub(repo_id="<your-username>/bean-disease-classifier")Create Space
- Go to https://huggingface.co/spaces/create
- Select Gradio as SDK
- Clone the repo and add:
app.pymodel_comparison.pylabels.pylabels.txtrequirements_runtime.txt
- Set
OPENAI_API_KEYas secret in Space settings
Push and Deploy
git add . git commit -m "Deploy bean disease classifier" git push- Your Space is now live at: https://huggingface.co/spaces//bean-disease-classifier
Submission Checklist
- Notebook executed: model trained and saved
- Model uploaded to Hugging Face hub
- Gradio app deployed as Space (public, working)
- README complete with:
- Dataset description (from beans HF dataset)
- Preprocessing details
- Model architecture and training parameters
- Model comparison examples
- Performance metrics
- Evaluation CSV generated
- App features working:
- Image upload
- Predictions from 3 models
- Example output visible
Troubleshooting
OpenAI model not working:
- Verify
OPENAI_API_KEYin.env(local) or Space Secrets (HF) - Check API key at https://platform.openai.com/account/api-keys
- Ensure account has credits
Out of GPU memory (training):
- Reduce batch size to 8:
per_device_train_batch_size=8 - Reduce epochs to 3:
num_train_epochs=3
Dataset loading error:
- Check internet connection (HF downloads dataset)
- Datasets library should auto-load from cache
- If needed, set cache dir:
export HF_DATASETS_CACHE="/path/to/cache"
App crashes on upload:
- Check all dependencies installed:
pip list | grep -E "transformers|gradio|torch" - Verify model path exists:
ls -la models/custom-vit-model/ - Check console output for error messages
References
Last Updated: April 2026
Dataset: Beans (Hugging Face)
Framework: PyTorch + Transformers
Model: Vision Transformer (ViT-Base)