File size: 10,752 Bytes
3871142 81b6104 3871142 81b6104 3871142 81b6104 3871142 81b6104 3871142 81b6104 3871142 81b6104 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 | ---
license: mit
language:
- en
tags:
- image-classification
- pytorch
- efficientnet
- flowers
- computer-vision
datasets:
- oxford-flowers-102
metrics:
- accuracy
- f1
pipeline_tag: image-classification
library_name: pytorch
base_model:
- google/efficientnet-b4
---
# πΈ EfficientNet-B4 Flower Classifier
A state-of-the-art image classification model for identifying **102 flower species** from the Oxford Flowers-102 dataset.
## Model Details
### Model Description
This model is built on the **EfficientNet-B4** backbone with a custom classifier head, trained using a novel **6-Phase Progressive Training** strategy. The training progressively increases image resolution (280px β 400px) and augmentation difficulty (None β MixUp β CutMix β Hybrid).
- **Developed by:** fth2745
- **Model type:** Image Classification (CNN)
- **License:** MIT
- **Finetuned from:** EfficientNet-B4 (ImageNet pretrained)
## Performance
| Metric | Test Set | Validation Set |
|--------|----------|----------------|
| **Top-1 Accuracy** | 94.49% | 97.45% |
| **Top-3 Accuracy** | 97.61% | 98.82% |
| **Top-5 Accuracy** | 98.49% | 99.31% |
| **Macro F1-Score** | 94.75% | 97.13% |
## Training Details
### Training Data
Oxford Flowers-102 dataset with offline data augmentation (tier-based augmentation for class balancing).
### Training Procedure
#### 6-Phase Progressive Training
| Phase | Epochs | Resolution | Augmentation | Dropout |
|-------|--------|------------|--------------|---------|
| 1. Basic | 1-5 | 280Γ280 | Basic Preprocessing | 0.4 |
| 2. MixUp Soft | 6-10 | 320Γ320 | MixUp Ξ±=0.2 | 0.2 |
| 3. MixUp Hard | 11-15 | 320Γ320 | MixUp Ξ±=0.4 | 0.2 |
| 4. CutMix Soft | 16-20 | 380Γ380 | CutMix Ξ±=0.2 | 0.2 |
| 5. CutMix Hard | 21-30 | 380Γ380 | CutMix Ξ±=0.5 | 0.2 |
| 6. Grand Finale | 31-40 | 400Γ400 | Hybrid | 0.2 |
#### Preprocessing
- Resize β RandomCrop β HorizontalFlip β Rotation (Β±20Β°) β Affine β ColorJitter β Normalize (ImageNet)
#### Training Hyperparameters
- **Optimizer:** AdamW
- **Learning Rate:** 1e-3
- **Weight Decay:** 1e-4
- **Scheduler:** CosineAnnealingWarmRestarts (T_0=5, T_mult=2)
- **Loss:** CrossEntropyLoss (label_smoothing=0.1)
- **Batch Size:** 8
- **Training Regime:** fp16 mixed precision (AMP)
---
## π― 6-Phase Progressive Training
```
Phase 1 βββ Phase 2 βββ Phase 3 βββ Phase 4 βββ Phase 5 βββ Phase 6
280px 320px 320px 380px 380px 400px
None MixUp MixUp CutMix CutMix Hybrid
Ξ±=0.2 Ξ±=0.4 Ξ±=0.2 Ξ±=0.5 MixUp+Cut
```
### Phase Details
| Phase | Epochs | Resolution | Technique | Alpha | Dropout | Purpose |
|-------|--------|------------|-----------|-------|---------|---------|
| 1οΈβ£ **Basic** | 1-5 | 280Γ280 | Basic Preprocessing | - | 0.4 | Learn fundamental features |
| 2οΈβ£ **MixUp Soft** | 6-10 | 320Γ320 | MixUp | 0.2 | 0.2 | Gentle texture blending |
| 3οΈβ£ **MixUp Hard** | 11-15 | 320Γ320 | MixUp | 0.4 | 0.2 | Strong texture mixing |
| 4οΈβ£ **CutMix Soft** | 16-20 | 380Γ380 | CutMix | 0.2 | 0.2 | Learn partial structures |
| 5οΈβ£ **CutMix Hard** | 21-30 | 380Γ380 | CutMix | 0.5 | 0.2 | Handle occlusions |
| 6οΈβ£ **Grand Finale** | 31-40 | 400Γ400 | Hybrid | 0.1-0.3 | 0.2 | Final polish with both |
> **π‘ Why Progressive Training?** Starting with low resolution helps the model learn general shapes first. Gradual augmentation increase builds robustness incrementally.
---
## πΌοΈ Preprocessing Pipeline (All Phases)
> **β οΈ Note:** These preprocessing steps are applied in **ALL PHASES**. Only `img_size` changes per phase.
### Complete Training Flow
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π· RAW IMAGE INPUT β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π STEP 1: IMAGE-LEVEL PREPROCESSING (Per image) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 1οΈβ£ Resize β (img_size + 32) Γ (img_size + 32) β
β 2οΈβ£ RandomCrop β img_size Γ img_size β
β 3οΈβ£ HorizontalFlip β p=0.5 β
β 4οΈβ£ RandomRotation β Β±20Β° β
β 5οΈβ£ RandomAffine β scale=(0.8, 1.2) β
β 6οΈβ£ ColorJitter β brightness, contrast, saturation=0.2 β
β 7οΈβ£ ToTensor β [0-255] β [0.0-1.0] β
β 8οΈβ£ Normalize β ImageNet mean/std β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π² STEP 2: BATCH-LEVEL AUGMENTATION (Phase-specific) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Phase 1: None (Preprocessing only) β
β Phase 2-3: MixUp (Ξ»ΓImageA + (1-Ξ»)ΓImageB) β
β Phase 4-5: CutMix (Patch swap between images) β
β Phase 6: Hybrid (MixUp + CutMix combined) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π― READY FOR MODEL TRAINING β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
### Phase-Specific Image Sizes
| Phase | img_size | Resize To | RandomCrop To |
|-------|----------|-----------|---------------|
| 1οΈβ£ Basic | 280 | 312Γ312 | 280Γ280 |
| 2οΈβ£ MixUp Soft | 320 | 352Γ352 | 320Γ320 |
| 3οΈβ£ MixUp Hard | 320 | 352Γ352 | 320Γ320 |
| 4οΈβ£ CutMix Soft | 380 | 412Γ412 | 380Γ380 |
| 5οΈβ£ CutMix Hard | 380 | 412Γ412 | 380Γ380 |
| 6οΈβ£ Grand Finale | 400 | 432Γ432 | 400Γ400 |
### Preprocessing Details (All Phases)
| Step | Transform | Parameters | Purpose |
|------|-----------|------------|---------|
| 1οΈβ£ | **Resize** | (size+32, size+32) | Prepare for random crop |
| 2οΈβ£ | **RandomCrop** | (size, size) | Random position augmentation |
| 3οΈβ£ | **RandomHorizontalFlip** | p=0.5 | Left-right invariance |
| 4οΈβ£ | **RandomRotation** | degrees=20 | Rotation invariance |
| 5οΈβ£ | **RandomAffine** | scale=(0.8, 1.2) | Scale variation |
| 6οΈβ£ | **ColorJitter** | (0.2, 0.2, 0.2) | Brightness/Contrast/Saturation |
| 7οΈβ£ | **ToTensor** | - | Convert to PyTorch tensor |
| 8οΈβ£ | **Normalize** | mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] | ImageNet normalization |
### Test/Validation Preprocessing
| Step | Transform | Parameters |
|------|-----------|------------|
| 1οΈβ£ | **Resize** | (size, size) |
| 2οΈβ£ | **ToTensor** | - |
| 3οΈβ£ | **Normalize** | ImageNet mean/std |
> **π‘ Key Insight:** Preprocessing (8 steps) is applied per image in every phase. MixUp/CutMix is applied **AFTER** preprocessing as batch-level augmentation.
---
## π Batch-Level Augmentation Techniques (Phase-Specific)
### MixUp
```
Image A (Rose) + Image B (Sunflower)
β
Ξ» = Beta(Ξ±, Ξ±) β New Image = Ξ»ΓA + (1-Ξ»)ΓB
β
Blended Image (70% Rose + 30% Sunflower features)
```
**Benefits:** β
Smoother decision boundaries β
Reduces overconfidence β
Better generalization
### CutMix
```
Image A (Rose) + Random BBox from Image B (Sunflower)
β
Paste B's region onto A
β
Composite Image (Rose background + Sunflower patch)
```
**Benefits:** β
Object completion ability β
Occlusion robustness β
Localization skills
### Hybrid (Grand Finale)
1. Apply MixUp (blend two images)
2. Apply CutMix (cut on blended image)
3. Result: Maximum augmentation challenge
---
## π‘οΈ Smart Training Features
### Two-Layer Early Stopping
| Layer | Condition | Patience | Action |
|-------|-----------|----------|--------|
| **Phase-level** | Trainβ + Valβ (Overfitting) | 2 epochs | Skip to next phase |
| **Global** | Val loss not improving | 8 epochs | Stop training |
### Smart Dropout Mechanism
| Signal | Condition | Action |
|--------|-----------|--------|
| β οΈ **Overfitting** | Trainβ + Valβ | Dropout += 0.05 |
| π **Underfitting** | Trainβ + Valβ | Dropout -= 0.05 |
| β
**Normal** | Trainβ + Valβ | No change |
**Bounds:** min=0.10, max=0.50
## Model Architecture
```
EfficientNet-B4 (pretrained)
βββ Custom Classifier Head
βββ BatchNorm1d (1792)
βββ Dropout
βββ Linear (1792 β 512)
βββ GELU
βββ BatchNorm1d (512)
βββ Dropout
βββ Linear (512 β 102)
```
**Total Parameters:** ~19M (all trainable)
## Supported Flower Classes
102 flower species including: Rose, Sunflower, Tulip, Orchid, Lily, Daisy, Hibiscus, Lotus, Magnolia, and 93 more.
## Limitations
- Trained only on Oxford Flowers-102 dataset
- Best performance at 400Γ400 resolution
- May not generalize well to flowers outside the 102 trained classes
## Citation
```bibtex
@misc{efficientnet-b4-flowers102,
title={EfficientNet-B4 Flower Classifier with 6-Phase Progressive Training},
author={fth2745},
year={2024},
url={https://huggingface.co/fth2745/efficientnet-b4-flowers102}
}
``` |