Add detailed training summary: GPUs, epochs, curves, LR schedule
Browse files
README.md
CHANGED
|
@@ -192,6 +192,27 @@ print(f"Predicted class: {pred}")
|
|
| 192 |
|
| 193 |
## Training Procedure
|
| 194 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 195 |
### Hyperparameters
|
| 196 |
|
| 197 |
- **Optimizer:** Adam (lr=1e-4)
|
|
@@ -202,6 +223,16 @@ print(f"Predicted class: {pred}")
|
|
| 202 |
- **Class balancing:** WeightedRandomSampler (inverse class frequency)
|
| 203 |
- **Input resolution:** 528x528 (resize to 572, then center crop for val/test)
|
| 204 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 205 |
### Data Augmentation (training only)
|
| 206 |
|
| 207 |
- RandomResizedCrop (528, scale 0.6-1.0)
|
|
@@ -211,11 +242,6 @@ print(f"Predicted class: {pred}")
|
|
| 211 |
- ColorJitter (brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2)
|
| 212 |
- ImageNet normalization
|
| 213 |
|
| 214 |
-
### Hardware
|
| 215 |
-
|
| 216 |
-
- NVIDIA GPU via SLURM HPC cluster
|
| 217 |
-
- CUDA 11.8
|
| 218 |
-
|
| 219 |
## Limitations
|
| 220 |
|
| 221 |
- Models were trained on a specific dataset of cereal pest images and may not generalize well to pest species not in the training set.
|
|
|
|
| 192 |
|
| 193 |
## Training Procedure
|
| 194 |
|
| 195 |
+
### Hardware
|
| 196 |
+
|
| 197 |
+
All models were trained on the University of Idaho RCDS HPC cluster using SLURM, with CUDA 11.8:
|
| 198 |
+
|
| 199 |
+
| Model | GPU | Epochs Completed | Wall Time | Best Checkpoint Epoch |
|
| 200 |
+
|-------|-----|:----------------:|:---------:|:---------------------:|
|
| 201 |
+
| EfficientNet-B6 | NVIDIA RTX 3090 (24 GB) | 10 | ~5h 47m | 1 |
|
| 202 |
+
| InceptionV3 | NVIDIA RTX 4090 (24 GB) | 51 | ~6h 02m | 4 |
|
| 203 |
+
| MobileNetV3-Large | NVIDIA RTX 4090 (24 GB) | 79 | ~6h 04m | 5 |
|
| 204 |
+
|
| 205 |
+
All three training runs were terminated by the SLURM scheduler wall time limit. Best model checkpoints were saved based on minimum validation loss, which was achieved early in training for all models. Subsequent epochs continued to reduce training loss but showed increasing validation loss (overfitting), confirming that early checkpoint selection was appropriate.
|
| 206 |
+
|
| 207 |
+
### Training Curves
|
| 208 |
+
|
| 209 |
+

|
| 210 |
+
|
| 211 |
+
**Key observations:**
|
| 212 |
+
- **EfficientNet-B6** converged fastest, achieving its best validation loss (0.433) at epoch 1 with 88.8% validation accuracy. By epoch 10, training accuracy reached 98.2% while validation accuracy peaked at 92.2% (epoch 7).
|
| 213 |
+
- **InceptionV3** achieved its best validation loss (0.578) at epoch 4. Validation accuracy plateaued around 89-90% while training accuracy continued to ~99.3%, indicating overfitting after ~15 epochs.
|
| 214 |
+
- **MobileNetV3-Large** achieved its best validation loss (0.474) at epoch 5. It trained the longest (79 epochs), reaching 99.7% training accuracy while validation accuracy stabilized around 90-92%.
|
| 215 |
+
|
| 216 |
### Hyperparameters
|
| 217 |
|
| 218 |
- **Optimizer:** Adam (lr=1e-4)
|
|
|
|
| 223 |
- **Class balancing:** WeightedRandomSampler (inverse class frequency)
|
| 224 |
- **Input resolution:** 528x528 (resize to 572, then center crop for val/test)
|
| 225 |
|
| 226 |
+
### Learning Rate Decay
|
| 227 |
+
|
| 228 |
+
The ReduceLROnPlateau scheduler reduced the learning rate by 0.9x each time validation loss failed to improve for 5 consecutive epochs:
|
| 229 |
+
|
| 230 |
+
| Model | Starting LR | Final LR | LR Reductions |
|
| 231 |
+
|-------|:-----------:|:--------:|:-------------:|
|
| 232 |
+
| EfficientNet-B6 | 1e-4 | 9e-5 | 1 |
|
| 233 |
+
| InceptionV3 | 1e-4 | 4.8e-5 | 8 |
|
| 234 |
+
| MobileNetV3-Large | 1e-4 | 2.8e-5 | 13 |
|
| 235 |
+
|
| 236 |
### Data Augmentation (training only)
|
| 237 |
|
| 238 |
- RandomResizedCrop (528, scale 0.6-1.0)
|
|
|
|
| 242 |
- ColorJitter (brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2)
|
| 243 |
- ImageNet normalization
|
| 244 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 245 |
## Limitations
|
| 246 |
|
| 247 |
- Models were trained on a specific dataset of cereal pest images and may not generalize well to pest species not in the training set.
|