|
|
---
|
|
|
license: mit
|
|
|
language:
|
|
|
- en
|
|
|
tags:
|
|
|
- image-classification
|
|
|
- pytorch
|
|
|
- efficientnet
|
|
|
- flowers
|
|
|
- computer-vision
|
|
|
datasets:
|
|
|
- oxford-flowers-102
|
|
|
metrics:
|
|
|
- accuracy
|
|
|
- f1
|
|
|
pipeline_tag: image-classification
|
|
|
library_name: pytorch
|
|
|
base_model:
|
|
|
- google/efficientnet-b4
|
|
|
---
|
|
|
|
|
|
# πΈ EfficientNet-B4 Flower Classifier |
|
|
|
|
|
A state-of-the-art image classification model for identifying **102 flower species** from the Oxford Flowers-102 dataset. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
This model is built on the **EfficientNet-B4** backbone with a custom classifier head, trained using a novel **6-Phase Progressive Training** strategy. The training progressively increases image resolution (280px β 400px) and augmentation difficulty (None β MixUp β CutMix β Hybrid). |
|
|
|
|
|
- **Developed by:** fth2745 |
|
|
- **Model type:** Image Classification (CNN) |
|
|
- **License:** MIT |
|
|
- **Finetuned from:** EfficientNet-B4 (ImageNet pretrained) |
|
|
|
|
|
## Performance |
|
|
|
|
|
| Metric | Test Set | Validation Set | |
|
|
|--------|----------|----------------| |
|
|
| **Top-1 Accuracy** | 94.49% | 97.45% | |
|
|
| **Top-3 Accuracy** | 97.61% | 98.82% | |
|
|
| **Top-5 Accuracy** | 98.49% | 99.31% | |
|
|
| **Macro F1-Score** | 94.75% | 97.13% | |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
Oxford Flowers-102 dataset with offline data augmentation (tier-based augmentation for class balancing). |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
#### 6-Phase Progressive Training |
|
|
|
|
|
| Phase | Epochs | Resolution | Augmentation | Dropout | |
|
|
|-------|--------|------------|--------------|---------| |
|
|
| 1. Basic | 1-5 | 280Γ280 | Basic Preprocessing | 0.4 | |
|
|
| 2. MixUp Soft | 6-10 | 320Γ320 | MixUp Ξ±=0.2 | 0.2 | |
|
|
| 3. MixUp Hard | 11-15 | 320Γ320 | MixUp Ξ±=0.4 | 0.2 | |
|
|
| 4. CutMix Soft | 16-20 | 380Γ380 | CutMix Ξ±=0.2 | 0.2 | |
|
|
| 5. CutMix Hard | 21-30 | 380Γ380 | CutMix Ξ±=0.5 | 0.2 | |
|
|
| 6. Grand Finale | 31-40 | 400Γ400 | Hybrid | 0.2 | |
|
|
|
|
|
#### Preprocessing |
|
|
|
|
|
- Resize β RandomCrop β HorizontalFlip β Rotation (Β±20Β°) β Affine β ColorJitter β Normalize (ImageNet) |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
|
|
- **Optimizer:** AdamW |
|
|
- **Learning Rate:** 1e-3 |
|
|
- **Weight Decay:** 1e-4 |
|
|
- **Scheduler:** CosineAnnealingWarmRestarts (T_0=5, T_mult=2) |
|
|
- **Loss:** CrossEntropyLoss (label_smoothing=0.1) |
|
|
- **Batch Size:** 8 |
|
|
- **Training Regime:** fp16 mixed precision (AMP) |
|
|
|
|
|
--- |
|
|
|
|
|
## π― 6-Phase Progressive Training |
|
|
|
|
|
``` |
|
|
Phase 1 βββ Phase 2 βββ Phase 3 βββ Phase 4 βββ Phase 5 βββ Phase 6 |
|
|
280px 320px 320px 380px 380px 400px |
|
|
None MixUp MixUp CutMix CutMix Hybrid |
|
|
Ξ±=0.2 Ξ±=0.4 Ξ±=0.2 Ξ±=0.5 MixUp+Cut |
|
|
``` |
|
|
|
|
|
### Phase Details |
|
|
|
|
|
| Phase | Epochs | Resolution | Technique | Alpha | Dropout | Purpose | |
|
|
|-------|--------|------------|-----------|-------|---------|---------| |
|
|
| 1οΈβ£ **Basic** | 1-5 | 280Γ280 | Basic Preprocessing | - | 0.4 | Learn fundamental features | |
|
|
| 2οΈβ£ **MixUp Soft** | 6-10 | 320Γ320 | MixUp | 0.2 | 0.2 | Gentle texture blending | |
|
|
| 3οΈβ£ **MixUp Hard** | 11-15 | 320Γ320 | MixUp | 0.4 | 0.2 | Strong texture mixing | |
|
|
| 4οΈβ£ **CutMix Soft** | 16-20 | 380Γ380 | CutMix | 0.2 | 0.2 | Learn partial structures | |
|
|
| 5οΈβ£ **CutMix Hard** | 21-30 | 380Γ380 | CutMix | 0.5 | 0.2 | Handle occlusions | |
|
|
| 6οΈβ£ **Grand Finale** | 31-40 | 400Γ400 | Hybrid | 0.1-0.3 | 0.2 | Final polish with both | |
|
|
|
|
|
> **π‘ Why Progressive Training?** Starting with low resolution helps the model learn general shapes first. Gradual augmentation increase builds robustness incrementally. |
|
|
|
|
|
--- |
|
|
|
|
|
## πΌοΈ Preprocessing Pipeline (All Phases) |
|
|
|
|
|
> **β οΈ Note:** These preprocessing steps are applied in **ALL PHASES**. Only `img_size` changes per phase. |
|
|
|
|
|
### Complete Training Flow |
|
|
|
|
|
``` |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
β π· RAW IMAGE INPUT β |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
β |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
β π STEP 1: IMAGE-LEVEL PREPROCESSING (Per image) β |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
|
|
β 1οΈβ£ Resize β (img_size + 32) Γ (img_size + 32) β |
|
|
β 2οΈβ£ RandomCrop β img_size Γ img_size β |
|
|
β 3οΈβ£ HorizontalFlip β p=0.5 β |
|
|
β 4οΈβ£ RandomRotation β Β±20Β° β |
|
|
β 5οΈβ£ RandomAffine β scale=(0.8, 1.2) β |
|
|
β 6οΈβ£ ColorJitter β brightness, contrast, saturation=0.2 β |
|
|
β 7οΈβ£ ToTensor β [0-255] β [0.0-1.0] β |
|
|
β 8οΈβ£ Normalize β ImageNet mean/std β |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
β |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
β π² STEP 2: BATCH-LEVEL AUGMENTATION (Phase-specific) β |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
|
|
β Phase 1: None (Preprocessing only) β |
|
|
β Phase 2-3: MixUp (Ξ»ΓImageA + (1-Ξ»)ΓImageB) β |
|
|
β Phase 4-5: CutMix (Patch swap between images) β |
|
|
β Phase 6: Hybrid (MixUp + CutMix combined) β |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
β |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
β π― READY FOR MODEL TRAINING β |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
``` |
|
|
|
|
|
### Phase-Specific Image Sizes |
|
|
|
|
|
| Phase | img_size | Resize To | RandomCrop To | |
|
|
|-------|----------|-----------|---------------| |
|
|
| 1οΈβ£ Basic | 280 | 312Γ312 | 280Γ280 | |
|
|
| 2οΈβ£ MixUp Soft | 320 | 352Γ352 | 320Γ320 | |
|
|
| 3οΈβ£ MixUp Hard | 320 | 352Γ352 | 320Γ320 | |
|
|
| 4οΈβ£ CutMix Soft | 380 | 412Γ412 | 380Γ380 | |
|
|
| 5οΈβ£ CutMix Hard | 380 | 412Γ412 | 380Γ380 | |
|
|
| 6οΈβ£ Grand Finale | 400 | 432Γ432 | 400Γ400 | |
|
|
|
|
|
### Preprocessing Details (All Phases) |
|
|
|
|
|
| Step | Transform | Parameters | Purpose | |
|
|
|------|-----------|------------|---------| |
|
|
| 1οΈβ£ | **Resize** | (size+32, size+32) | Prepare for random crop | |
|
|
| 2οΈβ£ | **RandomCrop** | (size, size) | Random position augmentation | |
|
|
| 3οΈβ£ | **RandomHorizontalFlip** | p=0.5 | Left-right invariance | |
|
|
| 4οΈβ£ | **RandomRotation** | degrees=20 | Rotation invariance | |
|
|
| 5οΈβ£ | **RandomAffine** | scale=(0.8, 1.2) | Scale variation | |
|
|
| 6οΈβ£ | **ColorJitter** | (0.2, 0.2, 0.2) | Brightness/Contrast/Saturation | |
|
|
| 7οΈβ£ | **ToTensor** | - | Convert to PyTorch tensor | |
|
|
| 8οΈβ£ | **Normalize** | mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] | ImageNet normalization | |
|
|
|
|
|
### Test/Validation Preprocessing |
|
|
|
|
|
| Step | Transform | Parameters | |
|
|
|------|-----------|------------| |
|
|
| 1οΈβ£ | **Resize** | (size, size) | |
|
|
| 2οΈβ£ | **ToTensor** | - | |
|
|
| 3οΈβ£ | **Normalize** | ImageNet mean/std | |
|
|
|
|
|
> **π‘ Key Insight:** Preprocessing (8 steps) is applied per image in every phase. MixUp/CutMix is applied **AFTER** preprocessing as batch-level augmentation. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Batch-Level Augmentation Techniques (Phase-Specific) |
|
|
|
|
|
### MixUp |
|
|
``` |
|
|
Image A (Rose) + Image B (Sunflower) |
|
|
β |
|
|
Ξ» = Beta(Ξ±, Ξ±) β New Image = Ξ»ΓA + (1-Ξ»)ΓB |
|
|
β |
|
|
Blended Image (70% Rose + 30% Sunflower features) |
|
|
``` |
|
|
|
|
|
**Benefits:** β
Smoother decision boundaries β
Reduces overconfidence β
Better generalization |
|
|
|
|
|
### CutMix |
|
|
``` |
|
|
Image A (Rose) + Random BBox from Image B (Sunflower) |
|
|
β |
|
|
Paste B's region onto A |
|
|
β |
|
|
Composite Image (Rose background + Sunflower patch) |
|
|
``` |
|
|
|
|
|
**Benefits:** β
Object completion ability β
Occlusion robustness β
Localization skills |
|
|
|
|
|
### Hybrid (Grand Finale) |
|
|
1. Apply MixUp (blend two images) |
|
|
2. Apply CutMix (cut on blended image) |
|
|
3. Result: Maximum augmentation challenge |
|
|
|
|
|
--- |
|
|
|
|
|
## π‘οΈ Smart Training Features |
|
|
|
|
|
### Two-Layer Early Stopping |
|
|
|
|
|
| Layer | Condition | Patience | Action | |
|
|
|-------|-----------|----------|--------| |
|
|
| **Phase-level** | Trainβ + Valβ (Overfitting) | 2 epochs | Skip to next phase | |
|
|
| **Global** | Val loss not improving | 8 epochs | Stop training | |
|
|
|
|
|
### Smart Dropout Mechanism |
|
|
|
|
|
| Signal | Condition | Action | |
|
|
|--------|-----------|--------| |
|
|
| β οΈ **Overfitting** | Trainβ + Valβ | Dropout += 0.05 | |
|
|
| π **Underfitting** | Trainβ + Valβ | Dropout -= 0.05 | |
|
|
| β
**Normal** | Trainβ + Valβ | No change | |
|
|
|
|
|
**Bounds:** min=0.10, max=0.50 |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
``` |
|
|
EfficientNet-B4 (pretrained) |
|
|
βββ Custom Classifier Head |
|
|
βββ BatchNorm1d (1792) |
|
|
βββ Dropout |
|
|
βββ Linear (1792 β 512) |
|
|
βββ GELU |
|
|
βββ BatchNorm1d (512) |
|
|
βββ Dropout |
|
|
βββ Linear (512 β 102) |
|
|
``` |
|
|
|
|
|
**Total Parameters:** ~19M (all trainable) |
|
|
|
|
|
## Supported Flower Classes |
|
|
|
|
|
102 flower species including: Rose, Sunflower, Tulip, Orchid, Lily, Daisy, Hibiscus, Lotus, Magnolia, and 93 more. |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Trained only on Oxford Flowers-102 dataset |
|
|
- Best performance at 400Γ400 resolution |
|
|
- May not generalize well to flowers outside the 102 trained classes |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{efficientnet-b4-flowers102, |
|
|
title={EfficientNet-B4 Flower Classifier with 6-Phase Progressive Training}, |
|
|
author={fth2745}, |
|
|
year={2024}, |
|
|
url={https://huggingface.co/fth2745/efficientnet-b4-flowers102} |
|
|
} |
|
|
``` |