--- license: mit language: - en tags: - image-classification - pytorch - efficientnet - flowers - computer-vision datasets: - oxford-flowers-102 metrics: - accuracy - f1 pipeline_tag: image-classification library_name: pytorch base_model: - google/efficientnet-b4 --- # 🌸 EfficientNet-B4 Flower Classifier A state-of-the-art image classification model for identifying **102 flower species** from the Oxford Flowers-102 dataset. ## Model Details ### Model Description This model is built on the **EfficientNet-B4** backbone with a custom classifier head, trained using a novel **6-Phase Progressive Training** strategy. The training progressively increases image resolution (280px β†’ 400px) and augmentation difficulty (None β†’ MixUp β†’ CutMix β†’ Hybrid). - **Developed by:** fth2745 - **Model type:** Image Classification (CNN) - **License:** MIT - **Finetuned from:** EfficientNet-B4 (ImageNet pretrained) ## Performance | Metric | Test Set | Validation Set | |--------|----------|----------------| | **Top-1 Accuracy** | 94.49% | 97.45% | | **Top-3 Accuracy** | 97.61% | 98.82% | | **Top-5 Accuracy** | 98.49% | 99.31% | | **Macro F1-Score** | 94.75% | 97.13% | ## Training Details ### Training Data Oxford Flowers-102 dataset with offline data augmentation (tier-based augmentation for class balancing). ### Training Procedure #### 6-Phase Progressive Training | Phase | Epochs | Resolution | Augmentation | Dropout | |-------|--------|------------|--------------|---------| | 1. Basic | 1-5 | 280Γ—280 | Basic Preprocessing | 0.4 | | 2. MixUp Soft | 6-10 | 320Γ—320 | MixUp Ξ±=0.2 | 0.2 | | 3. MixUp Hard | 11-15 | 320Γ—320 | MixUp Ξ±=0.4 | 0.2 | | 4. CutMix Soft | 16-20 | 380Γ—380 | CutMix Ξ±=0.2 | 0.2 | | 5. CutMix Hard | 21-30 | 380Γ—380 | CutMix Ξ±=0.5 | 0.2 | | 6. Grand Finale | 31-40 | 400Γ—400 | Hybrid | 0.2 | #### Preprocessing - Resize β†’ RandomCrop β†’ HorizontalFlip β†’ Rotation (Β±20Β°) β†’ Affine β†’ ColorJitter β†’ Normalize (ImageNet) #### Training Hyperparameters - **Optimizer:** AdamW - **Learning Rate:** 1e-3 - **Weight Decay:** 1e-4 - **Scheduler:** CosineAnnealingWarmRestarts (T_0=5, T_mult=2) - **Loss:** CrossEntropyLoss (label_smoothing=0.1) - **Batch Size:** 8 - **Training Regime:** fp16 mixed precision (AMP) --- ## 🎯 6-Phase Progressive Training ``` Phase 1 ──→ Phase 2 ──→ Phase 3 ──→ Phase 4 ──→ Phase 5 ──→ Phase 6 280px 320px 320px 380px 380px 400px None MixUp MixUp CutMix CutMix Hybrid Ξ±=0.2 Ξ±=0.4 Ξ±=0.2 Ξ±=0.5 MixUp+Cut ``` ### Phase Details | Phase | Epochs | Resolution | Technique | Alpha | Dropout | Purpose | |-------|--------|------------|-----------|-------|---------|---------| | 1️⃣ **Basic** | 1-5 | 280Γ—280 | Basic Preprocessing | - | 0.4 | Learn fundamental features | | 2️⃣ **MixUp Soft** | 6-10 | 320Γ—320 | MixUp | 0.2 | 0.2 | Gentle texture blending | | 3️⃣ **MixUp Hard** | 11-15 | 320Γ—320 | MixUp | 0.4 | 0.2 | Strong texture mixing | | 4️⃣ **CutMix Soft** | 16-20 | 380Γ—380 | CutMix | 0.2 | 0.2 | Learn partial structures | | 5️⃣ **CutMix Hard** | 21-30 | 380Γ—380 | CutMix | 0.5 | 0.2 | Handle occlusions | | 6️⃣ **Grand Finale** | 31-40 | 400Γ—400 | Hybrid | 0.1-0.3 | 0.2 | Final polish with both | > **πŸ’‘ Why Progressive Training?** Starting with low resolution helps the model learn general shapes first. Gradual augmentation increase builds robustness incrementally. --- ## πŸ–ΌοΈ Preprocessing Pipeline (All Phases) > **⚠️ Note:** These preprocessing steps are applied in **ALL PHASES**. Only `img_size` changes per phase. ### Complete Training Flow ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ πŸ“· RAW IMAGE INPUT β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ πŸ”„ STEP 1: IMAGE-LEVEL PREPROCESSING (Per image) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ 1️⃣ Resize β”‚ (img_size + 32) Γ— (img_size + 32) β”‚ β”‚ 2️⃣ RandomCrop β”‚ img_size Γ— img_size β”‚ β”‚ 3️⃣ HorizontalFlip β”‚ p=0.5 β”‚ β”‚ 4️⃣ RandomRotation β”‚ Β±20Β° β”‚ β”‚ 5️⃣ RandomAffine β”‚ scale=(0.8, 1.2) β”‚ β”‚ 6️⃣ ColorJitter β”‚ brightness, contrast, saturation=0.2 β”‚ β”‚ 7️⃣ ToTensor β”‚ [0-255] β†’ [0.0-1.0] β”‚ β”‚ 8️⃣ Normalize β”‚ ImageNet mean/std β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 🎲 STEP 2: BATCH-LEVEL AUGMENTATION (Phase-specific) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Phase 1: None (Preprocessing only) β”‚ β”‚ Phase 2-3: MixUp (λ×ImageA + (1-Ξ»)Γ—ImageB) β”‚ β”‚ Phase 4-5: CutMix (Patch swap between images) β”‚ β”‚ Phase 6: Hybrid (MixUp + CutMix combined) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 🎯 READY FOR MODEL TRAINING β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### Phase-Specific Image Sizes | Phase | img_size | Resize To | RandomCrop To | |-------|----------|-----------|---------------| | 1️⃣ Basic | 280 | 312Γ—312 | 280Γ—280 | | 2️⃣ MixUp Soft | 320 | 352Γ—352 | 320Γ—320 | | 3️⃣ MixUp Hard | 320 | 352Γ—352 | 320Γ—320 | | 4️⃣ CutMix Soft | 380 | 412Γ—412 | 380Γ—380 | | 5️⃣ CutMix Hard | 380 | 412Γ—412 | 380Γ—380 | | 6️⃣ Grand Finale | 400 | 432Γ—432 | 400Γ—400 | ### Preprocessing Details (All Phases) | Step | Transform | Parameters | Purpose | |------|-----------|------------|---------| | 1️⃣ | **Resize** | (size+32, size+32) | Prepare for random crop | | 2️⃣ | **RandomCrop** | (size, size) | Random position augmentation | | 3️⃣ | **RandomHorizontalFlip** | p=0.5 | Left-right invariance | | 4️⃣ | **RandomRotation** | degrees=20 | Rotation invariance | | 5️⃣ | **RandomAffine** | scale=(0.8, 1.2) | Scale variation | | 6️⃣ | **ColorJitter** | (0.2, 0.2, 0.2) | Brightness/Contrast/Saturation | | 7️⃣ | **ToTensor** | - | Convert to PyTorch tensor | | 8️⃣ | **Normalize** | mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] | ImageNet normalization | ### Test/Validation Preprocessing | Step | Transform | Parameters | |------|-----------|------------| | 1️⃣ | **Resize** | (size, size) | | 2️⃣ | **ToTensor** | - | | 3️⃣ | **Normalize** | ImageNet mean/std | > **πŸ’‘ Key Insight:** Preprocessing (8 steps) is applied per image in every phase. MixUp/CutMix is applied **AFTER** preprocessing as batch-level augmentation. --- ## πŸ”€ Batch-Level Augmentation Techniques (Phase-Specific) ### MixUp ``` Image A (Rose) + Image B (Sunflower) ↓ Ξ» = Beta(Ξ±, Ξ±) β†’ New Image = λ×A + (1-Ξ»)Γ—B ↓ Blended Image (70% Rose + 30% Sunflower features) ``` **Benefits:** βœ… Smoother decision boundaries βœ… Reduces overconfidence βœ… Better generalization ### CutMix ``` Image A (Rose) + Random BBox from Image B (Sunflower) ↓ Paste B's region onto A ↓ Composite Image (Rose background + Sunflower patch) ``` **Benefits:** βœ… Object completion ability βœ… Occlusion robustness βœ… Localization skills ### Hybrid (Grand Finale) 1. Apply MixUp (blend two images) 2. Apply CutMix (cut on blended image) 3. Result: Maximum augmentation challenge --- ## πŸ›‘οΈ Smart Training Features ### Two-Layer Early Stopping | Layer | Condition | Patience | Action | |-------|-----------|----------|--------| | **Phase-level** | Train↓ + Val↑ (Overfitting) | 2 epochs | Skip to next phase | | **Global** | Val loss not improving | 8 epochs | Stop training | ### Smart Dropout Mechanism | Signal | Condition | Action | |--------|-----------|--------| | ⚠️ **Overfitting** | Train↓ + Val↑ | Dropout += 0.05 | | πŸš‘ **Underfitting** | Train↑ + Val↑ | Dropout -= 0.05 | | βœ… **Normal** | Train↓ + Val↓ | No change | **Bounds:** min=0.10, max=0.50 ## Model Architecture ``` EfficientNet-B4 (pretrained) └── Custom Classifier Head β”œβ”€β”€ BatchNorm1d (1792) β”œβ”€β”€ Dropout β”œβ”€β”€ Linear (1792 β†’ 512) β”œβ”€β”€ GELU β”œβ”€β”€ BatchNorm1d (512) β”œβ”€β”€ Dropout └── Linear (512 β†’ 102) ``` **Total Parameters:** ~19M (all trainable) ## Supported Flower Classes 102 flower species including: Rose, Sunflower, Tulip, Orchid, Lily, Daisy, Hibiscus, Lotus, Magnolia, and 93 more. ## Limitations - Trained only on Oxford Flowers-102 dataset - Best performance at 400Γ—400 resolution - May not generalize well to flowers outside the 102 trained classes ## Citation ```bibtex @misc{efficientnet-b4-flowers102, title={EfficientNet-B4 Flower Classifier with 6-Phase Progressive Training}, author={fth2745}, year={2024}, url={https://huggingface.co/fth2745/efficientnet-b4-flowers102} } ```