DeepMoE EfficientNet-B0 fine-tuned on iNaturalist 2019
This model is a Mixture-of-Experts (DeepMoE) variant of EfficientNet-B0, fine-tuned on the iNaturalist 2019 dataset to optimize both accuracy and computational efficiency (FLOP reduction).
Training Results
- Final Score (Acc/FLOPs composite): 83.2947
- Final Validation Accuracy: 67.7%
- Expert Activation Ratio: 27.7%
- FLOPs Usage: 53.3% (compared to baseline B0)
- Baseline B0 Reference FLOPs: 388,184,000
- Total Runtime: 5404.17 seconds
Hyperparameters
- Batch Size: 256
- Gradient Accumulation Steps: 4
- Weight Decay: 0.005
Epochs
- Total Epochs: 10
- Joint Training Epochs: 10
- Routing-Frozen Finetuning Epochs: 0
DeepMoE Architecture & Routing
- MoE Start Stage: 1
- Latent Dimension: 32
- Sparsity Penalty ($\lambda_g$): 0.0003
- Target Sparsity ($\mu$): 0.5
- ReLU Init (Val / Std): 1 / 1
Learning Rates
- MoE Routing Parameters: 4.00e-02
- Classification Head: 2.00e-02
- Base Model (Body): 2.00e-03
- Finetune Phase (Frozen Routing): 0.00e+00
Training was tracked using Weights & Biases.
- Downloads last month
- 70