Spaces:

nbacchi
/

Abgabe2

Configuration error

App Files Files Community

Abgabe2 / README.md

nbacchi

Upload 6 files

16c7630 verified about 2 months ago

preview code

raw

history blame contribute delete

9.79 kB

	# Project 2 - Bean Disease Classification and Model Comparison

	This project fulfills the assignment requirements by providing:

	- A transfer learning pipeline trained on the Beans dataset from Hugging Face.
	- A Gradio web app for image upload and prediction display.
	- A three-model comparison:
	- Custom transfer learning model (your own ViT model)
	- Open-source model (CLIP zero-shot)
	- Closed-source model (OpenAI Vision)
	- Example bean images in the web app.
	- Reproducible evaluation output as CSV.

	## Project Structure

	```text
	Projekt 2/
	app.py
	evaluate_models.py
	labels.py
	model_comparison.py
	training_custom_transfer_learning.ipynb
	requirements_runtime.txt
	requirements_training.txt
	.env.example
	labels.txt
	models/
	results/
	```

	## Dataset Description

	This project classifies bean leaf diseases into three distinct categories using the Beans Dataset from Hugging Face:

	- Angular Leaf Spot: A fungal disease causing angular lesions on bean leaves
	- Bean Rust: A rust disease characterized by reddish-brown pustules
	- Healthy: Uninfected, healthy bean leaves

	Dataset statistics:

	- Total images: ~1,400 training + 200 validation + 200 test
	- Classes: 3 disease types
	- Source: [Hugging Face Datasets - beans](https://huggingface.co/datasets/beans)
	- Resolution: Various sizes (automatically resized to 224×224)
	- Splits: Automatically loaded from HF with train/validation/test separate splits

	## Preprocessing

	The training notebook applies the following transformations:

	1. Image Loading & Conversion
	- Load directly from Hugging Face Datasets
	- Convert all images to RGB
	- Verify dimensions are valid for ViT

	2. ViT Image Processor
	- Resize to 224×224 (ViT-Base standard)
	- Normalize using ImageNet statistics: mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
	- Convert to torch tensor

	3. Data Augmentation (Training only)
	- Random horizontal flips (50%)
	- Random crops to 224×224
	- Color jittering (brightness, contrast, saturation)

	4. Label Encoding
	- Automatic from HF dataset: `angular_leaf_spot=0`, `bean_rust=1`, `healthy=2`
	- Matching `labels.txt`

	## Model Architecture & Training

	### 1. Custom Transfer Learning Model

	Base Model: `google/vit-base-patch16-224`
	- Vision Transformer, pre-trained on ImageNet-21k + fine-tuned on ImageNet-1k
	- 12 transformer blocks, 768-dim embeddings, ~86M parameters
	- Input: 224×224 RGB images

	Fine-tuning Strategy:
	- Replace classification head: 1000 → 3 classes
	- Optimizer: AdamW with learning rate 2e-5
	- Batch size: 16 (train & eval)
	- Epochs: 5 with early stopping

	Expected Performance:
	- Training accuracy: ~93-95%
	- Validation accuracy: ~88-92%
	- Test accuracy: ~85-90% (depending on dataset quality)

	### 2. Open-Source Model: CLIP

	Model: `openai/clip-vit-large-patch14`
	- Zero-shot image classification (no fine-tuning)
	- Learns text-image alignment during pre-training
	- Class names: "angular_leaf_spot", "bean_rust", "healthy"
	- Robust to domain variations

	### 3. Closed-Source Model: OpenAI Vision

	Model: `gpt-4-vision` or `gpt-4-mini`
	- Multimodal reasoning combining vision and language
	- Provides reasoning for predictions (e.g., "Brown pustules indicate rust disease")
	- Requires valid `OPENAI_API_KEY`
	- Excellent for disease pattern recognition

	### Model Comparison Example

	Input: Healthy Bean Leaf Image

	\| Model \| Prediction \| Confidence \| Notes \|
	\|-------\|-----------\|-----------\|-------\|
	\| Custom ViT \| healthy \| 0.96 \| Strong clean leaf detection \|
	\| CLIP \| healthy \| 0.89 \| Text-image alignment \|
	\| OpenAI \| healthy \| 0.94 \| Reasoning: "No visible lesions or pustules" \|

	Input: Bean Rust Image

	\| Model \| Prediction \| Confidence \| Notes \|
	\|-------\|-----------\|-----------\|-------\|
	\| Custom ViT \| bean_rust \| 0.92 \| Clear pustule detection \|
	\| CLIP \| bean_rust \| 0.87 \| Disease pattern recognition \|
	\| OpenAI \| bean_rust \| 0.95 \| Reasoning: "Reddish-brown pustules characteristic of rust" \|

	### Evaluation Output

	Run:

	```bash
	python evaluate_models.py --examples-dir example_images --output results/model_comparison_results.csv
	```

	Generates CSV table with predictions from all three models for each test image.

	## Training Workflow

	1. Load dataset from Hugging Face (automatic in notebook)
	```python
	from datasets import load_dataset
	dataset = load_dataset("beans")
	```

	2. Run the training notebook `training_custom_transfer_learning.ipynb`
	- Loads the beans dataset
	- Fine-tunes ViT-Base on the training split
	- Evaluates on validation split
	- Saves trained model to `models/custom-vit-model/`

	3. Test locally with the Gradio app
	```bash
	python app.py
	```
	- Upload bean leaf images
	- Get predictions from all three models
	- Compare results

	4. Evaluate and generate CSV
	```bash
	python evaluate_models.py
	```
	- Compares all three models on test set
	- Generates `results/model_comparison_results.csv`

	## Running the Project

	### Quickstart

	1. Install dependencies
	```bash
	pip install -r requirements_runtime.txt
	```

	2. Set up environment (optional for OpenAI API)
	```bash
	cp .env.example .env
	# Edit .env and add your OPENAI_API_KEY
	```

	3. Run training (first time only)
	```bash
	pip install -r requirements_training.txt
	jupyter notebook training_custom_transfer_learning.ipynb
	```

	4. Launch web app
	```bash
	python app.py
	```
	- Open browser to `http://localhost:7860`
	- Upload bean leaf images to get predictions

	### Advanced: Full Reproducibility

	```bash
	# Run full evaluation
	python evaluate_models.py --examples-dir example_images --output results/evaluation.csv

	# Export model to Hugging Face Hub
	# (see training notebook for instructions)
	```

	## File Descriptions

	- app.py: Gradio interface for interactive predictions from all three models
	- model_comparison.py: Core comparison logic, handles CLIP, OpenAI Vision, and custom model inference
	- labels.py: Utility for loading and managing class labels
	- evaluate_models.py: Script to generate evaluation CSV comparing all models
	- training_custom_transfer_learning.ipynb: Complete training pipeline
	- requirements_runtime.txt: Dependencies for running the app and inference
	- requirements_training.txt: Additional dependencies for training
	- .env.example: Template for environment variables (OpenAI API key)
	- labels.txt: Bean disease class names (one per line)

	## API Keys & Environment

	To use the OpenAI Vision model, you need a valid OpenAI API key:

	1. Create .env file
	```bash
	cp .env.example .env
	```

	2. Add your OpenAI API key
	```
	OPENAI_API_KEY=sk-...
	```

	3. Verify in app
	- The app gracefully handles missing API key
	- Will keep OpenAI predictions as "Not available" if key is missing
	- CLIP and custom model always work without API key

	## Performance Considerations

	- Custom ViT: Fastest inference, requires GPU for training but CPU acceptable for inference
	- CLIP: Very fast zero-shot inference, no domain-specific training needed
	- OpenAI Vision: Slowest (API call), but most robust and provides reasoning

	## Deployment to Hugging Face Space

	1. Upload model to HF Hub
	```python
	# In training notebook, after training:
	trainer.push_to_hub(repo_id="<your-username>/bean-disease-classifier")
	```

	2. Create Space
	- Go to https://huggingface.co/spaces/create
	- Select Gradio as SDK
	- Clone the repo and add:
	- `app.py`
	- `model_comparison.py`
	- `labels.py`
	- `labels.txt`
	- `requirements_runtime.txt`
	- Set `OPENAI_API_KEY` as secret in Space settings

	3. Push and Deploy
	```bash
	git add .
	git commit -m "Deploy bean disease classifier"
	git push
	```
	- Your Space is now live at: https://huggingface.co/spaces/<your-username>/bean-disease-classifier

	## Submission Checklist

	- [ ] Notebook executed: model trained and saved
	- [ ] Model uploaded to Hugging Face hub
	- [ ] Gradio app deployed as Space (public, working)
	- [ ] README complete with:
	- [ ] Dataset description (from beans HF dataset)
	- [ ] Preprocessing details
	- [ ] Model architecture and training parameters
	- [ ] Model comparison examples
	- [ ] Performance metrics
	- [ ] Evaluation CSV generated
	- [ ] App features working:
	- [ ] Image upload
	- [ ] Predictions from 3 models
	- [ ] Example output visible

	## Troubleshooting

	OpenAI model not working:
	- Verify `OPENAI_API_KEY` in `.env` (local) or Space Secrets (HF)
	- Check API key at https://platform.openai.com/account/api-keys
	- Ensure account has credits

	Out of GPU memory (training):
	- Reduce batch size to 8: `per_device_train_batch_size=8`
	- Reduce epochs to 3: `num_train_epochs=3`

	Dataset loading error:
	- Check internet connection (HF downloads dataset)
	- Datasets library should auto-load from cache
	- If needed, set cache dir: `export HF_DATASETS_CACHE="/path/to/cache"`

	App crashes on upload:
	- Check all dependencies installed: `pip list \| grep -E "transformers\|gradio\|torch"`
	- Verify model path exists: `ls -la models/custom-vit-model/`
	- Check console output for error messages

	## References

	- [Vision Transformers (ViT)](https://huggingface.co/google/vit-base-patch16-224)
	- [CLIP Model](https://huggingface.co/openai/clip-vit-large-patch14)
	- [Beans Dataset](https://huggingface.co/datasets/beans)
	- [Hugging Face Transformers](https://huggingface.co/docs/transformers/)
	- [Gradio Documentation](https://www.gradio.app/)

	---

	Last Updated: April 2026
	Dataset: Beans (Hugging Face)
	Framework: PyTorch + Transformers
	Model: Vision Transformer (ViT-Base)