File size: 10,752 Bytes
3871142
 
 
81b6104
3871142
81b6104
 
 
 
 
3871142
81b6104
3871142
81b6104
 
3871142
 
81b6104
 
3871142
81b6104
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
---

license: mit
language:
- en
tags:
- image-classification
- pytorch
- efficientnet
- flowers
- computer-vision
datasets:
- oxford-flowers-102
metrics:
- accuracy
- f1
pipeline_tag: image-classification
library_name: pytorch
base_model:
- google/efficientnet-b4
---


# 🌸 EfficientNet-B4 Flower Classifier

A state-of-the-art image classification model for identifying **102 flower species** from the Oxford Flowers-102 dataset.

## Model Details

### Model Description

This model is built on the **EfficientNet-B4** backbone with a custom classifier head, trained using a novel **6-Phase Progressive Training** strategy. The training progressively increases image resolution (280px β†’ 400px) and augmentation difficulty (None β†’ MixUp β†’ CutMix β†’ Hybrid).

- **Developed by:** fth2745
- **Model type:** Image Classification (CNN)
- **License:** MIT
- **Finetuned from:** EfficientNet-B4 (ImageNet pretrained)

## Performance

| Metric | Test Set | Validation Set |
|--------|----------|----------------|
| **Top-1 Accuracy** | 94.49% | 97.45% |
| **Top-3 Accuracy** | 97.61% | 98.82% |
| **Top-5 Accuracy** | 98.49% | 99.31% |
| **Macro F1-Score** | 94.75% | 97.13% |

## Training Details

### Training Data

Oxford Flowers-102 dataset with offline data augmentation (tier-based augmentation for class balancing).

### Training Procedure

#### 6-Phase Progressive Training

| Phase | Epochs | Resolution | Augmentation | Dropout |
|-------|--------|------------|--------------|---------|
| 1. Basic | 1-5 | 280Γ—280 | Basic Preprocessing | 0.4 |
| 2. MixUp Soft | 6-10 | 320Γ—320 | MixUp Ξ±=0.2 | 0.2 |
| 3. MixUp Hard | 11-15 | 320Γ—320 | MixUp Ξ±=0.4 | 0.2 |
| 4. CutMix Soft | 16-20 | 380Γ—380 | CutMix Ξ±=0.2 | 0.2 |
| 5. CutMix Hard | 21-30 | 380Γ—380 | CutMix Ξ±=0.5 | 0.2 |
| 6. Grand Finale | 31-40 | 400Γ—400 | Hybrid | 0.2 |

#### Preprocessing

- Resize β†’ RandomCrop β†’ HorizontalFlip β†’ Rotation (Β±20Β°) β†’ Affine β†’ ColorJitter β†’ Normalize (ImageNet)

#### Training Hyperparameters

- **Optimizer:** AdamW
- **Learning Rate:** 1e-3
- **Weight Decay:** 1e-4
- **Scheduler:** CosineAnnealingWarmRestarts (T_0=5, T_mult=2)
- **Loss:** CrossEntropyLoss (label_smoothing=0.1)
- **Batch Size:** 8
- **Training Regime:** fp16 mixed precision (AMP)

---

## 🎯 6-Phase Progressive Training

```
Phase 1 ──→ Phase 2 ──→ Phase 3 ──→ Phase 4 ──→ Phase 5 ──→ Phase 6
 280px      320px       320px       380px       380px       400px
  None      MixUp       MixUp      CutMix      CutMix      Hybrid
           Ξ±=0.2       Ξ±=0.4       Ξ±=0.2       Ξ±=0.5      MixUp+Cut
```

### Phase Details

| Phase | Epochs | Resolution | Technique | Alpha | Dropout | Purpose |
|-------|--------|------------|-----------|-------|---------|---------|
| 1️⃣ **Basic** | 1-5 | 280Γ—280 | Basic Preprocessing | - | 0.4 | Learn fundamental features |
| 2️⃣ **MixUp Soft** | 6-10 | 320Γ—320 | MixUp | 0.2 | 0.2 | Gentle texture blending |
| 3️⃣ **MixUp Hard** | 11-15 | 320Γ—320 | MixUp | 0.4 | 0.2 | Strong texture mixing |
| 4️⃣ **CutMix Soft** | 16-20 | 380Γ—380 | CutMix | 0.2 | 0.2 | Learn partial structures |
| 5️⃣ **CutMix Hard** | 21-30 | 380Γ—380 | CutMix | 0.5 | 0.2 | Handle occlusions |
| 6️⃣ **Grand Finale** | 31-40 | 400Γ—400 | Hybrid | 0.1-0.3 | 0.2 | Final polish with both |

> **πŸ’‘ Why Progressive Training?** Starting with low resolution helps the model learn general shapes first. Gradual augmentation increase builds robustness incrementally.

---

## πŸ–ΌοΈ Preprocessing Pipeline (All Phases)

> **⚠️ Note:** These preprocessing steps are applied in **ALL PHASES**. Only `img_size` changes per phase.

### Complete Training Flow

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              πŸ“· RAW IMAGE INPUT                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     πŸ”„ STEP 1: IMAGE-LEVEL PREPROCESSING (Per image)        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  1️⃣ Resize         β”‚ (img_size + 32) Γ— (img_size + 32)     β”‚
β”‚  2️⃣ RandomCrop     β”‚ img_size Γ— img_size                   β”‚
β”‚  3️⃣ HorizontalFlip β”‚ p=0.5                                 β”‚
β”‚  4️⃣ RandomRotation β”‚ Β±20Β°                                  β”‚
β”‚  5️⃣ RandomAffine   β”‚ scale=(0.8, 1.2)                      β”‚
β”‚  6️⃣ ColorJitter    β”‚ brightness, contrast, saturation=0.2  β”‚
β”‚  7️⃣ ToTensor       β”‚ [0-255] β†’ [0.0-1.0]                   β”‚
β”‚  8️⃣ Normalize      β”‚ ImageNet mean/std                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚        🎲 STEP 2: BATCH-LEVEL AUGMENTATION (Phase-specific) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Phase 1: None (Preprocessing only)                         β”‚
β”‚  Phase 2-3: MixUp (λ×ImageA + (1-Ξ»)Γ—ImageB)                 β”‚
β”‚  Phase 4-5: CutMix (Patch swap between images)              β”‚
β”‚  Phase 6: Hybrid (MixUp + CutMix combined)                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              🎯 READY FOR MODEL TRAINING                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

### Phase-Specific Image Sizes

| Phase | img_size | Resize To | RandomCrop To |
|-------|----------|-----------|---------------|
| 1️⃣ Basic | 280 | 312Γ—312 | 280Γ—280 |
| 2️⃣ MixUp Soft | 320 | 352Γ—352 | 320Γ—320 |
| 3️⃣ MixUp Hard | 320 | 352Γ—352 | 320Γ—320 |
| 4️⃣ CutMix Soft | 380 | 412Γ—412 | 380Γ—380 |
| 5️⃣ CutMix Hard | 380 | 412Γ—412 | 380Γ—380 |
| 6️⃣ Grand Finale | 400 | 432Γ—432 | 400Γ—400 |

### Preprocessing Details (All Phases)

| Step | Transform | Parameters | Purpose |
|------|-----------|------------|---------|
| 1️⃣ | **Resize** | (size+32, size+32) | Prepare for random crop |
| 2️⃣ | **RandomCrop** | (size, size) | Random position augmentation |
| 3️⃣ | **RandomHorizontalFlip** | p=0.5 | Left-right invariance |
| 4️⃣ | **RandomRotation** | degrees=20 | Rotation invariance |
| 5️⃣ | **RandomAffine** | scale=(0.8, 1.2) | Scale variation |
| 6️⃣ | **ColorJitter** | (0.2, 0.2, 0.2) | Brightness/Contrast/Saturation |
| 7️⃣ | **ToTensor** | - | Convert to PyTorch tensor |
| 8️⃣ | **Normalize** | mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] | ImageNet normalization |

### Test/Validation Preprocessing

| Step | Transform | Parameters |
|------|-----------|------------|
| 1️⃣ | **Resize** | (size, size) |
| 2️⃣ | **ToTensor** | - |
| 3️⃣ | **Normalize** | ImageNet mean/std |

> **πŸ’‘ Key Insight:** Preprocessing (8 steps) is applied per image in every phase. MixUp/CutMix is applied **AFTER** preprocessing as batch-level augmentation.

---

## πŸ”€ Batch-Level Augmentation Techniques (Phase-Specific)

### MixUp
```
Image A (Rose) + Image B (Sunflower) 
    ↓
Ξ» = Beta(Ξ±, Ξ±)  β†’  New Image = λ×A + (1-Ξ»)Γ—B
    ↓
Blended Image (70% Rose + 30% Sunflower features)
```

**Benefits:** βœ… Smoother decision boundaries βœ… Reduces overconfidence βœ… Better generalization

### CutMix
```
Image A (Rose) + Random BBox from Image B (Sunflower)
    ↓
Paste B's region onto A
    ↓
Composite Image (Rose background + Sunflower patch)
```

**Benefits:** βœ… Object completion ability βœ… Occlusion robustness βœ… Localization skills

### Hybrid (Grand Finale)
1. Apply MixUp (blend two images)
2. Apply CutMix (cut on blended image)
3. Result: Maximum augmentation challenge

---

## πŸ›‘οΈ Smart Training Features

### Two-Layer Early Stopping

| Layer | Condition | Patience | Action |
|-------|-----------|----------|--------|
| **Phase-level** | Train↓ + Val↑ (Overfitting) | 2 epochs | Skip to next phase |
| **Global** | Val loss not improving | 8 epochs | Stop training |

### Smart Dropout Mechanism

| Signal | Condition | Action |
|--------|-----------|--------|
| ⚠️ **Overfitting** | Train↓ + Val↑ | Dropout += 0.05 |
| πŸš‘ **Underfitting** | Train↑ + Val↑ | Dropout -= 0.05 |
| βœ… **Normal** | Train↓ + Val↓ | No change |

**Bounds:** min=0.10, max=0.50

## Model Architecture

```
EfficientNet-B4 (pretrained)
    └── Custom Classifier Head
        β”œβ”€β”€ BatchNorm1d (1792)
        β”œβ”€β”€ Dropout
        β”œβ”€β”€ Linear (1792 β†’ 512)
        β”œβ”€β”€ GELU
        β”œβ”€β”€ BatchNorm1d (512)
        β”œβ”€β”€ Dropout
        └── Linear (512 β†’ 102)
```

**Total Parameters:** ~19M (all trainable)

## Supported Flower Classes

102 flower species including: Rose, Sunflower, Tulip, Orchid, Lily, Daisy, Hibiscus, Lotus, Magnolia, and 93 more.

## Limitations

- Trained only on Oxford Flowers-102 dataset
- Best performance at 400Γ—400 resolution
- May not generalize well to flowers outside the 102 trained classes

## Citation

```bibtex
@misc{efficientnet-b4-flowers102,
  title={EfficientNet-B4 Flower Classifier with 6-Phase Progressive Training},
  author={fth2745},
  year={2024},
  url={https://huggingface.co/fth2745/efficientnet-b4-flowers102}
}
```