Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,10 +1,47 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
tags:
|
| 3 |
-
-
|
| 4 |
-
-
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
-
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
datasets:
|
| 3 |
+
- inaturalist2019
|
| 4 |
+
language: en
|
| 5 |
tags:
|
| 6 |
+
- image-classification
|
| 7 |
+
- pytorch
|
| 8 |
+
- efficientnet
|
| 9 |
+
- mixture-of-experts
|
| 10 |
+
- deepmoe
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# DeepMoE EfficientNet-B0 fine-tuned on iNaturalist 2019
|
| 14 |
+
|
| 15 |
+
This model is a Mixture-of-Experts (DeepMoE) variant of EfficientNet-B0, fine-tuned on the iNaturalist 2019 dataset to optimize both accuracy and computational efficiency (FLOP reduction).
|
| 16 |
+
|
| 17 |
+
## Training Results
|
| 18 |
+
- **Final Score (Acc/FLOPs composite)**: 83.2947
|
| 19 |
+
- **Expert Activation Ratio**: 27.7%
|
| 20 |
+
- **FLOPs Usage**: 53.3% *(compared to baseline B0)*
|
| 21 |
+
- **Baseline B0 Reference FLOPs**: 388,184,000
|
| 22 |
+
- **Total Runtime**: 5404.17 seconds
|
| 23 |
+
|
| 24 |
+
## Hyperparameters
|
| 25 |
+
- **Batch Size**: 256
|
| 26 |
+
- **Gradient Accumulation Steps**: 4
|
| 27 |
+
- **Weight Decay**: 0.005
|
| 28 |
+
|
| 29 |
+
### Epochs
|
| 30 |
+
- **Total Epochs**: 10
|
| 31 |
+
- Joint Training Epochs: 10
|
| 32 |
+
- Routing-Frozen Finetuning Epochs: 0
|
| 33 |
+
|
| 34 |
+
### DeepMoE Architecture & Routing
|
| 35 |
+
- **MoE Start Stage**: 1
|
| 36 |
+
- **Latent Dimension**: 32
|
| 37 |
+
- **Sparsity Penalty ($\lambda_g$)**: 0.0003
|
| 38 |
+
- **Target Sparsity ($\mu$)**: 0.5
|
| 39 |
+
- **ReLU Init (Val / Std)**: 1 / 1
|
| 40 |
+
|
| 41 |
+
### Learning Rates
|
| 42 |
+
- **MoE Routing Parameters**: 4.00e-02
|
| 43 |
+
- **Classification Head**: 2.00e-02
|
| 44 |
+
- **Base Model (Body)**: 2.00e-03
|
| 45 |
+
- **Finetune Phase (Frozen Routing)**: 0.00e+00
|
| 46 |
+
|
| 47 |
+
*Training was tracked using [Weights & Biases](https://wandb.ai).*
|