fth2745's picture
Update README.md
81b6104 verified
---
license: mit
language:
- en
tags:
- image-classification
- pytorch
- efficientnet
- flowers
- computer-vision
datasets:
- oxford-flowers-102
metrics:
- accuracy
- f1
pipeline_tag: image-classification
library_name: pytorch
base_model:
- google/efficientnet-b4
---
# 🌸 EfficientNet-B4 Flower Classifier
A state-of-the-art image classification model for identifying **102 flower species** from the Oxford Flowers-102 dataset.
## Model Details
### Model Description
This model is built on the **EfficientNet-B4** backbone with a custom classifier head, trained using a novel **6-Phase Progressive Training** strategy. The training progressively increases image resolution (280px β†’ 400px) and augmentation difficulty (None β†’ MixUp β†’ CutMix β†’ Hybrid).
- **Developed by:** fth2745
- **Model type:** Image Classification (CNN)
- **License:** MIT
- **Finetuned from:** EfficientNet-B4 (ImageNet pretrained)
## Performance
| Metric | Test Set | Validation Set |
|--------|----------|----------------|
| **Top-1 Accuracy** | 94.49% | 97.45% |
| **Top-3 Accuracy** | 97.61% | 98.82% |
| **Top-5 Accuracy** | 98.49% | 99.31% |
| **Macro F1-Score** | 94.75% | 97.13% |
## Training Details
### Training Data
Oxford Flowers-102 dataset with offline data augmentation (tier-based augmentation for class balancing).
### Training Procedure
#### 6-Phase Progressive Training
| Phase | Epochs | Resolution | Augmentation | Dropout |
|-------|--------|------------|--------------|---------|
| 1. Basic | 1-5 | 280Γ—280 | Basic Preprocessing | 0.4 |
| 2. MixUp Soft | 6-10 | 320Γ—320 | MixUp Ξ±=0.2 | 0.2 |
| 3. MixUp Hard | 11-15 | 320Γ—320 | MixUp Ξ±=0.4 | 0.2 |
| 4. CutMix Soft | 16-20 | 380Γ—380 | CutMix Ξ±=0.2 | 0.2 |
| 5. CutMix Hard | 21-30 | 380Γ—380 | CutMix Ξ±=0.5 | 0.2 |
| 6. Grand Finale | 31-40 | 400Γ—400 | Hybrid | 0.2 |
#### Preprocessing
- Resize β†’ RandomCrop β†’ HorizontalFlip β†’ Rotation (Β±20Β°) β†’ Affine β†’ ColorJitter β†’ Normalize (ImageNet)
#### Training Hyperparameters
- **Optimizer:** AdamW
- **Learning Rate:** 1e-3
- **Weight Decay:** 1e-4
- **Scheduler:** CosineAnnealingWarmRestarts (T_0=5, T_mult=2)
- **Loss:** CrossEntropyLoss (label_smoothing=0.1)
- **Batch Size:** 8
- **Training Regime:** fp16 mixed precision (AMP)
---
## 🎯 6-Phase Progressive Training
```
Phase 1 ──→ Phase 2 ──→ Phase 3 ──→ Phase 4 ──→ Phase 5 ──→ Phase 6
280px 320px 320px 380px 380px 400px
None MixUp MixUp CutMix CutMix Hybrid
Ξ±=0.2 Ξ±=0.4 Ξ±=0.2 Ξ±=0.5 MixUp+Cut
```
### Phase Details
| Phase | Epochs | Resolution | Technique | Alpha | Dropout | Purpose |
|-------|--------|------------|-----------|-------|---------|---------|
| 1️⃣ **Basic** | 1-5 | 280Γ—280 | Basic Preprocessing | - | 0.4 | Learn fundamental features |
| 2️⃣ **MixUp Soft** | 6-10 | 320Γ—320 | MixUp | 0.2 | 0.2 | Gentle texture blending |
| 3️⃣ **MixUp Hard** | 11-15 | 320Γ—320 | MixUp | 0.4 | 0.2 | Strong texture mixing |
| 4️⃣ **CutMix Soft** | 16-20 | 380Γ—380 | CutMix | 0.2 | 0.2 | Learn partial structures |
| 5️⃣ **CutMix Hard** | 21-30 | 380Γ—380 | CutMix | 0.5 | 0.2 | Handle occlusions |
| 6️⃣ **Grand Finale** | 31-40 | 400Γ—400 | Hybrid | 0.1-0.3 | 0.2 | Final polish with both |
> **πŸ’‘ Why Progressive Training?** Starting with low resolution helps the model learn general shapes first. Gradual augmentation increase builds robustness incrementally.
---
## πŸ–ΌοΈ Preprocessing Pipeline (All Phases)
> **⚠️ Note:** These preprocessing steps are applied in **ALL PHASES**. Only `img_size` changes per phase.
### Complete Training Flow
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ πŸ“· RAW IMAGE INPUT β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ πŸ”„ STEP 1: IMAGE-LEVEL PREPROCESSING (Per image) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1️⃣ Resize β”‚ (img_size + 32) Γ— (img_size + 32) β”‚
β”‚ 2️⃣ RandomCrop β”‚ img_size Γ— img_size β”‚
β”‚ 3️⃣ HorizontalFlip β”‚ p=0.5 β”‚
β”‚ 4️⃣ RandomRotation β”‚ Β±20Β° β”‚
β”‚ 5️⃣ RandomAffine β”‚ scale=(0.8, 1.2) β”‚
β”‚ 6️⃣ ColorJitter β”‚ brightness, contrast, saturation=0.2 β”‚
β”‚ 7️⃣ ToTensor β”‚ [0-255] β†’ [0.0-1.0] β”‚
β”‚ 8️⃣ Normalize β”‚ ImageNet mean/std β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 🎲 STEP 2: BATCH-LEVEL AUGMENTATION (Phase-specific) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Phase 1: None (Preprocessing only) β”‚
β”‚ Phase 2-3: MixUp (λ×ImageA + (1-Ξ»)Γ—ImageB) β”‚
β”‚ Phase 4-5: CutMix (Patch swap between images) β”‚
β”‚ Phase 6: Hybrid (MixUp + CutMix combined) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 🎯 READY FOR MODEL TRAINING β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
### Phase-Specific Image Sizes
| Phase | img_size | Resize To | RandomCrop To |
|-------|----------|-----------|---------------|
| 1️⃣ Basic | 280 | 312Γ—312 | 280Γ—280 |
| 2️⃣ MixUp Soft | 320 | 352Γ—352 | 320Γ—320 |
| 3️⃣ MixUp Hard | 320 | 352Γ—352 | 320Γ—320 |
| 4️⃣ CutMix Soft | 380 | 412Γ—412 | 380Γ—380 |
| 5️⃣ CutMix Hard | 380 | 412Γ—412 | 380Γ—380 |
| 6️⃣ Grand Finale | 400 | 432Γ—432 | 400Γ—400 |
### Preprocessing Details (All Phases)
| Step | Transform | Parameters | Purpose |
|------|-----------|------------|---------|
| 1️⃣ | **Resize** | (size+32, size+32) | Prepare for random crop |
| 2️⃣ | **RandomCrop** | (size, size) | Random position augmentation |
| 3️⃣ | **RandomHorizontalFlip** | p=0.5 | Left-right invariance |
| 4️⃣ | **RandomRotation** | degrees=20 | Rotation invariance |
| 5️⃣ | **RandomAffine** | scale=(0.8, 1.2) | Scale variation |
| 6️⃣ | **ColorJitter** | (0.2, 0.2, 0.2) | Brightness/Contrast/Saturation |
| 7️⃣ | **ToTensor** | - | Convert to PyTorch tensor |
| 8️⃣ | **Normalize** | mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] | ImageNet normalization |
### Test/Validation Preprocessing
| Step | Transform | Parameters |
|------|-----------|------------|
| 1️⃣ | **Resize** | (size, size) |
| 2️⃣ | **ToTensor** | - |
| 3️⃣ | **Normalize** | ImageNet mean/std |
> **πŸ’‘ Key Insight:** Preprocessing (8 steps) is applied per image in every phase. MixUp/CutMix is applied **AFTER** preprocessing as batch-level augmentation.
---
## πŸ”€ Batch-Level Augmentation Techniques (Phase-Specific)
### MixUp
```
Image A (Rose) + Image B (Sunflower)
↓
Ξ» = Beta(Ξ±, Ξ±) β†’ New Image = λ×A + (1-Ξ»)Γ—B
↓
Blended Image (70% Rose + 30% Sunflower features)
```
**Benefits:** βœ… Smoother decision boundaries βœ… Reduces overconfidence βœ… Better generalization
### CutMix
```
Image A (Rose) + Random BBox from Image B (Sunflower)
↓
Paste B's region onto A
↓
Composite Image (Rose background + Sunflower patch)
```
**Benefits:** βœ… Object completion ability βœ… Occlusion robustness βœ… Localization skills
### Hybrid (Grand Finale)
1. Apply MixUp (blend two images)
2. Apply CutMix (cut on blended image)
3. Result: Maximum augmentation challenge
---
## πŸ›‘οΈ Smart Training Features
### Two-Layer Early Stopping
| Layer | Condition | Patience | Action |
|-------|-----------|----------|--------|
| **Phase-level** | Train↓ + Val↑ (Overfitting) | 2 epochs | Skip to next phase |
| **Global** | Val loss not improving | 8 epochs | Stop training |
### Smart Dropout Mechanism
| Signal | Condition | Action |
|--------|-----------|--------|
| ⚠️ **Overfitting** | Train↓ + Val↑ | Dropout += 0.05 |
| πŸš‘ **Underfitting** | Train↑ + Val↑ | Dropout -= 0.05 |
| βœ… **Normal** | Train↓ + Val↓ | No change |
**Bounds:** min=0.10, max=0.50
## Model Architecture
```
EfficientNet-B4 (pretrained)
└── Custom Classifier Head
β”œβ”€β”€ BatchNorm1d (1792)
β”œβ”€β”€ Dropout
β”œβ”€β”€ Linear (1792 β†’ 512)
β”œβ”€β”€ GELU
β”œβ”€β”€ BatchNorm1d (512)
β”œβ”€β”€ Dropout
└── Linear (512 β†’ 102)
```
**Total Parameters:** ~19M (all trainable)
## Supported Flower Classes
102 flower species including: Rose, Sunflower, Tulip, Orchid, Lily, Daisy, Hibiscus, Lotus, Magnolia, and 93 more.
## Limitations
- Trained only on Oxford Flowers-102 dataset
- Best performance at 400Γ—400 resolution
- May not generalize well to flowers outside the 102 trained classes
## Citation
```bibtex
@misc{efficientnet-b4-flowers102,
title={EfficientNet-B4 Flower Classifier with 6-Phase Progressive Training},
author={fth2745},
year={2024},
url={https://huggingface.co/fth2745/efficientnet-b4-flowers102}
}
```