Update README.md

81b6104 verified about 2 months ago

10.8 kB

	---
	license: mit
	language:
	- en
	tags:
	- image-classification
	- pytorch
	- efficientnet
	- flowers
	- computer-vision
	datasets:
	- oxford-flowers-102
	metrics:
	- accuracy
	- f1
	pipeline_tag: image-classification
	library_name: pytorch
	base_model:
	- google/efficientnet-b4
	---

	# 🌸 EfficientNet-B4 Flower Classifier

	A state-of-the-art image classification model for identifying 102 flower species from the Oxford Flowers-102 dataset.

	## Model Details

	### Model Description

	This model is built on the EfficientNet-B4 backbone with a custom classifier head, trained using a novel 6-Phase Progressive Training strategy. The training progressively increases image resolution (280px → 400px) and augmentation difficulty (None → MixUp → CutMix → Hybrid).

	- Developed by: fth2745
	- Model type: Image Classification (CNN)
	- License: MIT
	- Finetuned from: EfficientNet-B4 (ImageNet pretrained)

	## Performance

	\| Metric \| Test Set \| Validation Set \|
	\|--------\|----------\|----------------\|
	\| Top-1 Accuracy \| 94.49% \| 97.45% \|
	\| Top-3 Accuracy \| 97.61% \| 98.82% \|
	\| Top-5 Accuracy \| 98.49% \| 99.31% \|
	\| Macro F1-Score \| 94.75% \| 97.13% \|

	## Training Details

	### Training Data

	Oxford Flowers-102 dataset with offline data augmentation (tier-based augmentation for class balancing).

	### Training Procedure

	#### 6-Phase Progressive Training

	\| Phase \| Epochs \| Resolution \| Augmentation \| Dropout \|
	\|-------\|--------\|------------\|--------------\|---------\|
	\| 1. Basic \| 1-5 \| 280×280 \| Basic Preprocessing \| 0.4 \|
	\| 2. MixUp Soft \| 6-10 \| 320×320 \| MixUp α=0.2 \| 0.2 \|
	\| 3. MixUp Hard \| 11-15 \| 320×320 \| MixUp α=0.4 \| 0.2 \|
	\| 4. CutMix Soft \| 16-20 \| 380×380 \| CutMix α=0.2 \| 0.2 \|
	\| 5. CutMix Hard \| 21-30 \| 380×380 \| CutMix α=0.5 \| 0.2 \|
	\| 6. Grand Finale \| 31-40 \| 400×400 \| Hybrid \| 0.2 \|

	#### Preprocessing

	- Resize → RandomCrop → HorizontalFlip → Rotation (±20°) → Affine → ColorJitter → Normalize (ImageNet)

	#### Training Hyperparameters

	- Optimizer: AdamW
	- Learning Rate: 1e-3
	- Weight Decay: 1e-4
	- Scheduler: CosineAnnealingWarmRestarts (T_0=5, T_mult=2)
	- Loss: CrossEntropyLoss (label_smoothing=0.1)
	- Batch Size: 8
	- Training Regime: fp16 mixed precision (AMP)

	---

	## 🎯 6-Phase Progressive Training

	```
	Phase 1 ──→ Phase 2 ──→ Phase 3 ──→ Phase 4 ──→ Phase 5 ──→ Phase 6
	280px 320px 320px 380px 380px 400px
	None MixUp MixUp CutMix CutMix Hybrid
	α=0.2 α=0.4 α=0.2 α=0.5 MixUp+Cut
	```

	### Phase Details

	\| Phase \| Epochs \| Resolution \| Technique \| Alpha \| Dropout \| Purpose \|
	\|-------\|--------\|------------\|-----------\|-------\|---------\|---------\|
	\| 1️⃣ Basic \| 1-5 \| 280×280 \| Basic Preprocessing \| - \| 0.4 \| Learn fundamental features \|
	\| 2️⃣ MixUp Soft \| 6-10 \| 320×320 \| MixUp \| 0.2 \| 0.2 \| Gentle texture blending \|
	\| 3️⃣ MixUp Hard \| 11-15 \| 320×320 \| MixUp \| 0.4 \| 0.2 \| Strong texture mixing \|
	\| 4️⃣ CutMix Soft \| 16-20 \| 380×380 \| CutMix \| 0.2 \| 0.2 \| Learn partial structures \|
	\| 5️⃣ CutMix Hard \| 21-30 \| 380×380 \| CutMix \| 0.5 \| 0.2 \| Handle occlusions \|
	\| 6️⃣ Grand Finale \| 31-40 \| 400×400 \| Hybrid \| 0.1-0.3 \| 0.2 \| Final polish with both \|

	> 💡 Why Progressive Training? Starting with low resolution helps the model learn general shapes first. Gradual augmentation increase builds robustness incrementally.

	---

	## 🖼️ Preprocessing Pipeline (All Phases)

	> ⚠️ Note: These preprocessing steps are applied in ALL PHASES. Only `img_size` changes per phase.

	### Complete Training Flow

	```
	┌─────────────────────────────────────────────────────────────┐
	│ 📷 RAW IMAGE INPUT │
	└─────────────────────────────────────────────────────────────┘
	↓
	┌─────────────────────────────────────────────────────────────┐
	│ 🔄 STEP 1: IMAGE-LEVEL PREPROCESSING (Per image) │
	├─────────────────────────────────────────────────────────────┤
	│ 1️⃣ Resize │ (img_size + 32) × (img_size + 32) │
	│ 2️⃣ RandomCrop │ img_size × img_size │
	│ 3️⃣ HorizontalFlip │ p=0.5 │
	│ 4️⃣ RandomRotation │ ±20° │
	│ 5️⃣ RandomAffine │ scale=(0.8, 1.2) │
	│ 6️⃣ ColorJitter │ brightness, contrast, saturation=0.2 │
	│ 7️⃣ ToTensor │ [0-255] → [0.0-1.0] │
	│ 8️⃣ Normalize │ ImageNet mean/std │
	└─────────────────────────────────────────────────────────────┘
	↓
	┌─────────────────────────────────────────────────────────────┐
	│ 🎲 STEP 2: BATCH-LEVEL AUGMENTATION (Phase-specific) │
	├─────────────────────────────────────────────────────────────┤
	│ Phase 1: None (Preprocessing only) │
	│ Phase 2-3: MixUp (λ×ImageA + (1-λ)×ImageB) │
	│ Phase 4-5: CutMix (Patch swap between images) │
	│ Phase 6: Hybrid (MixUp + CutMix combined) │
	└─────────────────────────────────────────────────────────────┘
	↓
	┌─────────────────────────────────────────────────────────────┐
	│ 🎯 READY FOR MODEL TRAINING │
	└─────────────────────────────────────────────────────────────┘
	```

	### Phase-Specific Image Sizes

	\| Phase \| img_size \| Resize To \| RandomCrop To \|
	\|-------\|----------\|-----------\|---------------\|
	\| 1️⃣ Basic \| 280 \| 312×312 \| 280×280 \|
	\| 2️⃣ MixUp Soft \| 320 \| 352×352 \| 320×320 \|
	\| 3️⃣ MixUp Hard \| 320 \| 352×352 \| 320×320 \|
	\| 4️⃣ CutMix Soft \| 380 \| 412×412 \| 380×380 \|
	\| 5️⃣ CutMix Hard \| 380 \| 412×412 \| 380×380 \|
	\| 6️⃣ Grand Finale \| 400 \| 432×432 \| 400×400 \|

	### Preprocessing Details (All Phases)

	\| Step \| Transform \| Parameters \| Purpose \|
	\|------\|-----------\|------------\|---------\|
	\| 1️⃣ \| Resize \| (size+32, size+32) \| Prepare for random crop \|
	\| 2️⃣ \| RandomCrop \| (size, size) \| Random position augmentation \|
	\| 3️⃣ \| RandomHorizontalFlip \| p=0.5 \| Left-right invariance \|
	\| 4️⃣ \| RandomRotation \| degrees=20 \| Rotation invariance \|
	\| 5️⃣ \| RandomAffine \| scale=(0.8, 1.2) \| Scale variation \|
	\| 6️⃣ \| ColorJitter \| (0.2, 0.2, 0.2) \| Brightness/Contrast/Saturation \|
	\| 7️⃣ \| ToTensor \| - \| Convert to PyTorch tensor \|
	\| 8️⃣ \| Normalize \| mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] \| ImageNet normalization \|

	### Test/Validation Preprocessing

	\| Step \| Transform \| Parameters \|
	\|------\|-----------\|------------\|
	\| 1️⃣ \| Resize \| (size, size) \|
	\| 2️⃣ \| ToTensor \| - \|
	\| 3️⃣ \| Normalize \| ImageNet mean/std \|

	> 💡 Key Insight: Preprocessing (8 steps) is applied per image in every phase. MixUp/CutMix is applied AFTER preprocessing as batch-level augmentation.

	---

	## 🔀 Batch-Level Augmentation Techniques (Phase-Specific)

	### MixUp
	```
	Image A (Rose) + Image B (Sunflower)
	↓
	λ = Beta(α, α) → New Image = λ×A + (1-λ)×B
	↓
	Blended Image (70% Rose + 30% Sunflower features)
	```

	Benefits: ✅ Smoother decision boundaries ✅ Reduces overconfidence ✅ Better generalization

	### CutMix
	```
	Image A (Rose) + Random BBox from Image B (Sunflower)
	↓
	Paste B's region onto A
	↓
	Composite Image (Rose background + Sunflower patch)
	```

	Benefits: ✅ Object completion ability ✅ Occlusion robustness ✅ Localization skills

	### Hybrid (Grand Finale)
	1. Apply MixUp (blend two images)
	2. Apply CutMix (cut on blended image)
	3. Result: Maximum augmentation challenge

	---

	## 🛡️ Smart Training Features

	### Two-Layer Early Stopping

	\| Layer \| Condition \| Patience \| Action \|
	\|-------\|-----------\|----------\|--------\|
	\| Phase-level \| Train↓ + Val↑ (Overfitting) \| 2 epochs \| Skip to next phase \|
	\| Global \| Val loss not improving \| 8 epochs \| Stop training \|

	### Smart Dropout Mechanism

	\| Signal \| Condition \| Action \|
	\|--------\|-----------\|--------\|
	\| ⚠️ Overfitting \| Train↓ + Val↑ \| Dropout += 0.05 \|
	\| 🚑 Underfitting \| Train↑ + Val↑ \| Dropout -= 0.05 \|
	\| ✅ Normal \| Train↓ + Val↓ \| No change \|

	Bounds: min=0.10, max=0.50

	## Model Architecture

	```
	EfficientNet-B4 (pretrained)
	└── Custom Classifier Head
	├── BatchNorm1d (1792)
	├── Dropout
	├── Linear (1792 → 512)
	├── GELU
	├── BatchNorm1d (512)
	├── Dropout
	└── Linear (512 → 102)
	```

	Total Parameters: ~19M (all trainable)

	## Supported Flower Classes

	102 flower species including: Rose, Sunflower, Tulip, Orchid, Lily, Daisy, Hibiscus, Lotus, Magnolia, and 93 more.

	## Limitations

	- Trained only on Oxford Flowers-102 dataset
	- Best performance at 400×400 resolution
	- May not generalize well to flowers outside the 102 trained classes

	## Citation

	```bibtex
	@misc{efficientnet-b4-flowers102,
	title={EfficientNet-B4 Flower Classifier with 6-Phase Progressive Training},
	author={fth2745},
	year={2024},
	url={https://huggingface.co/fth2745/efficientnet-b4-flowers102}
	}
	```