File size: 10,752 Bytes

---

license: mit
language:
- en
tags:
- image-classification
- pytorch
- efficientnet
- flowers
- computer-vision
datasets:
- oxford-flowers-102
metrics:
- accuracy
- f1
pipeline_tag: image-classification
library_name: pytorch
base_model:
- google/efficientnet-b4
---


# 🌸 EfficientNet-B4 Flower Classifier

A state-of-the-art image classification model for identifying **102 flower species** from the Oxford Flowers-102 dataset.

## Model Details

### Model Description

This model is built on the **EfficientNet-B4** backbone with a custom classifier head, trained using a novel **6-Phase Progressive Training** strategy. The training progressively increases image resolution (280px → 400px) and augmentation difficulty (None → MixUp → CutMix → Hybrid).

- **Developed by:** fth2745
- **Model type:** Image Classification (CNN)
- **License:** MIT
- **Finetuned from:** EfficientNet-B4 (ImageNet pretrained)

## Performance

| Metric | Test Set | Validation Set |
|--------|----------|----------------|
| **Top-1 Accuracy** | 94.49% | 97.45% |
| **Top-3 Accuracy** | 97.61% | 98.82% |
| **Top-5 Accuracy** | 98.49% | 99.31% |
| **Macro F1-Score** | 94.75% | 97.13% |

## Training Details

### Training Data

Oxford Flowers-102 dataset with offline data augmentation (tier-based augmentation for class balancing).

### Training Procedure

#### 6-Phase Progressive Training

| Phase | Epochs | Resolution | Augmentation | Dropout |
|-------|--------|------------|--------------|---------|
| 1. Basic | 1-5 | 280×280 | Basic Preprocessing | 0.4 |
| 2. MixUp Soft | 6-10 | 320×320 | MixUp α=0.2 | 0.2 |
| 3. MixUp Hard | 11-15 | 320×320 | MixUp α=0.4 | 0.2 |
| 4. CutMix Soft | 16-20 | 380×380 | CutMix α=0.2 | 0.2 |
| 5. CutMix Hard | 21-30 | 380×380 | CutMix α=0.5 | 0.2 |
| 6. Grand Finale | 31-40 | 400×400 | Hybrid | 0.2 |

#### Preprocessing

- Resize → RandomCrop → HorizontalFlip → Rotation (±20°) → Affine → ColorJitter → Normalize (ImageNet)

#### Training Hyperparameters

- **Optimizer:** AdamW
- **Learning Rate:** 1e-3
- **Weight Decay:** 1e-4
- **Scheduler:** CosineAnnealingWarmRestarts (T_0=5, T_mult=2)
- **Loss:** CrossEntropyLoss (label_smoothing=0.1)
- **Batch Size:** 8
- **Training Regime:** fp16 mixed precision (AMP)

---

## 🎯 6-Phase Progressive Training

```
Phase 1 ──→ Phase 2 ──→ Phase 3 ──→ Phase 4 ──→ Phase 5 ──→ Phase 6
 280px      320px       320px       380px       380px       400px
  None      MixUp       MixUp      CutMix      CutMix      Hybrid
           α=0.2       α=0.4       α=0.2       α=0.5      MixUp+Cut
```

### Phase Details

| Phase | Epochs | Resolution | Technique | Alpha | Dropout | Purpose |
|-------|--------|------------|-----------|-------|---------|---------|
| 1️⃣ **Basic** | 1-5 | 280×280 | Basic Preprocessing | - | 0.4 | Learn fundamental features |
| 2️⃣ **MixUp Soft** | 6-10 | 320×320 | MixUp | 0.2 | 0.2 | Gentle texture blending |
| 3️⃣ **MixUp Hard** | 11-15 | 320×320 | MixUp | 0.4 | 0.2 | Strong texture mixing |
| 4️⃣ **CutMix Soft** | 16-20 | 380×380 | CutMix | 0.2 | 0.2 | Learn partial structures |
| 5️⃣ **CutMix Hard** | 21-30 | 380×380 | CutMix | 0.5 | 0.2 | Handle occlusions |
| 6️⃣ **Grand Finale** | 31-40 | 400×400 | Hybrid | 0.1-0.3 | 0.2 | Final polish with both |

> **💡 Why Progressive Training?** Starting with low resolution helps the model learn general shapes first. Gradual augmentation increase builds robustness incrementally.

---

## 🖼️ Preprocessing Pipeline (All Phases)

> **⚠️ Note:** These preprocessing steps are applied in **ALL PHASES**. Only `img_size` changes per phase.

### Complete Training Flow

```
┌─────────────────────────────────────────────────────────────┐
│              📷 RAW IMAGE INPUT                             │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│     🔄 STEP 1: IMAGE-LEVEL PREPROCESSING (Per image)        │
├─────────────────────────────────────────────────────────────┤
│  1️⃣ Resize         │ (img_size + 32) × (img_size + 32)     │
│  2️⃣ RandomCrop     │ img_size × img_size                   │
│  3️⃣ HorizontalFlip │ p=0.5                                 │
│  4️⃣ RandomRotation │ ±20°                                  │
│  5️⃣ RandomAffine   │ scale=(0.8, 1.2)                      │
│  6️⃣ ColorJitter    │ brightness, contrast, saturation=0.2  │
│  7️⃣ ToTensor       │ [0-255] → [0.0-1.0]                   │
│  8️⃣ Normalize      │ ImageNet mean/std                     │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│        🎲 STEP 2: BATCH-LEVEL AUGMENTATION (Phase-specific) │
├─────────────────────────────────────────────────────────────┤
│  Phase 1: None (Preprocessing only)                         │
│  Phase 2-3: MixUp (λ×ImageA + (1-λ)×ImageB)                 │
│  Phase 4-5: CutMix (Patch swap between images)              │
│  Phase 6: Hybrid (MixUp + CutMix combined)                  │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│              🎯 READY FOR MODEL TRAINING                    │
└─────────────────────────────────────────────────────────────┘
```

### Phase-Specific Image Sizes

| Phase | img_size | Resize To | RandomCrop To |
|-------|----------|-----------|---------------|
| 1️⃣ Basic | 280 | 312×312 | 280×280 |
| 2️⃣ MixUp Soft | 320 | 352×352 | 320×320 |
| 3️⃣ MixUp Hard | 320 | 352×352 | 320×320 |
| 4️⃣ CutMix Soft | 380 | 412×412 | 380×380 |
| 5️⃣ CutMix Hard | 380 | 412×412 | 380×380 |
| 6️⃣ Grand Finale | 400 | 432×432 | 400×400 |

### Preprocessing Details (All Phases)

| Step | Transform | Parameters | Purpose |
|------|-----------|------------|---------|
| 1️⃣ | **Resize** | (size+32, size+32) | Prepare for random crop |
| 2️⃣ | **RandomCrop** | (size, size) | Random position augmentation |
| 3️⃣ | **RandomHorizontalFlip** | p=0.5 | Left-right invariance |
| 4️⃣ | **RandomRotation** | degrees=20 | Rotation invariance |
| 5️⃣ | **RandomAffine** | scale=(0.8, 1.2) | Scale variation |
| 6️⃣ | **ColorJitter** | (0.2, 0.2, 0.2) | Brightness/Contrast/Saturation |
| 7️⃣ | **ToTensor** | - | Convert to PyTorch tensor |
| 8️⃣ | **Normalize** | mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] | ImageNet normalization |

### Test/Validation Preprocessing

| Step | Transform | Parameters |
|------|-----------|------------|
| 1️⃣ | **Resize** | (size, size) |
| 2️⃣ | **ToTensor** | - |
| 3️⃣ | **Normalize** | ImageNet mean/std |

> **💡 Key Insight:** Preprocessing (8 steps) is applied per image in every phase. MixUp/CutMix is applied **AFTER** preprocessing as batch-level augmentation.

---

## 🔀 Batch-Level Augmentation Techniques (Phase-Specific)

### MixUp
```
Image A (Rose) + Image B (Sunflower) 
    ↓
λ = Beta(α, α)  →  New Image = λ×A + (1-λ)×B
    ↓
Blended Image (70% Rose + 30% Sunflower features)
```

**Benefits:** ✅ Smoother decision boundaries ✅ Reduces overconfidence ✅ Better generalization

### CutMix
```
Image A (Rose) + Random BBox from Image B (Sunflower)
    ↓
Paste B's region onto A
    ↓
Composite Image (Rose background + Sunflower patch)
```

**Benefits:** ✅ Object completion ability ✅ Occlusion robustness ✅ Localization skills

### Hybrid (Grand Finale)
1. Apply MixUp (blend two images)
2. Apply CutMix (cut on blended image)
3. Result: Maximum augmentation challenge

---

## 🛡️ Smart Training Features

### Two-Layer Early Stopping

| Layer | Condition | Patience | Action |
|-------|-----------|----------|--------|
| **Phase-level** | Train↓ + Val↑ (Overfitting) | 2 epochs | Skip to next phase |
| **Global** | Val loss not improving | 8 epochs | Stop training |

### Smart Dropout Mechanism

| Signal | Condition | Action |
|--------|-----------|--------|
| ⚠️ **Overfitting** | Train↓ + Val↑ | Dropout += 0.05 |
| 🚑 **Underfitting** | Train↑ + Val↑ | Dropout -= 0.05 |
| ✅ **Normal** | Train↓ + Val↓ | No change |

**Bounds:** min=0.10, max=0.50

## Model Architecture

```
EfficientNet-B4 (pretrained)
    └── Custom Classifier Head
        ├── BatchNorm1d (1792)
        ├── Dropout
        ├── Linear (1792 → 512)
        ├── GELU
        ├── BatchNorm1d (512)
        ├── Dropout
        └── Linear (512 → 102)
```

**Total Parameters:** ~19M (all trainable)

## Supported Flower Classes

102 flower species including: Rose, Sunflower, Tulip, Orchid, Lily, Daisy, Hibiscus, Lotus, Magnolia, and 93 more.

## Limitations

- Trained only on Oxford Flowers-102 dataset
- Best performance at 400×400 resolution
- May not generalize well to flowers outside the 102 trained classes

## Citation

```bibtex
@misc{efficientnet-b4-flowers102,
  title={EfficientNet-B4 Flower Classifier with 6-Phase Progressive Training},
  author={fth2745},
  year={2024},
  url={https://huggingface.co/fth2745/efficientnet-b4-flowers102}
}
```