sheneman commited on
Commit
b7fc6e3
·
verified ·
1 Parent(s): 2907d3f

Add detailed training summary: GPUs, epochs, curves, LR schedule

Browse files
Files changed (1) hide show
  1. README.md +31 -5
README.md CHANGED
@@ -192,6 +192,27 @@ print(f"Predicted class: {pred}")
192
 
193
  ## Training Procedure
194
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
195
  ### Hyperparameters
196
 
197
  - **Optimizer:** Adam (lr=1e-4)
@@ -202,6 +223,16 @@ print(f"Predicted class: {pred}")
202
  - **Class balancing:** WeightedRandomSampler (inverse class frequency)
203
  - **Input resolution:** 528x528 (resize to 572, then center crop for val/test)
204
 
 
 
 
 
 
 
 
 
 
 
205
  ### Data Augmentation (training only)
206
 
207
  - RandomResizedCrop (528, scale 0.6-1.0)
@@ -211,11 +242,6 @@ print(f"Predicted class: {pred}")
211
  - ColorJitter (brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2)
212
  - ImageNet normalization
213
 
214
- ### Hardware
215
-
216
- - NVIDIA GPU via SLURM HPC cluster
217
- - CUDA 11.8
218
-
219
  ## Limitations
220
 
221
  - Models were trained on a specific dataset of cereal pest images and may not generalize well to pest species not in the training set.
 
192
 
193
  ## Training Procedure
194
 
195
+ ### Hardware
196
+
197
+ All models were trained on the University of Idaho RCDS HPC cluster using SLURM, with CUDA 11.8:
198
+
199
+ | Model | GPU | Epochs Completed | Wall Time | Best Checkpoint Epoch |
200
+ |-------|-----|:----------------:|:---------:|:---------------------:|
201
+ | EfficientNet-B6 | NVIDIA RTX 3090 (24 GB) | 10 | ~5h 47m | 1 |
202
+ | InceptionV3 | NVIDIA RTX 4090 (24 GB) | 51 | ~6h 02m | 4 |
203
+ | MobileNetV3-Large | NVIDIA RTX 4090 (24 GB) | 79 | ~6h 04m | 5 |
204
+
205
+ All three training runs were terminated by the SLURM scheduler wall time limit. Best model checkpoints were saved based on minimum validation loss, which was achieved early in training for all models. Subsequent epochs continued to reduce training loss but showed increasing validation loss (overfitting), confirming that early checkpoint selection was appropriate.
206
+
207
+ ### Training Curves
208
+
209
+ ![Training Curves](training_curves.png)
210
+
211
+ **Key observations:**
212
+ - **EfficientNet-B6** converged fastest, achieving its best validation loss (0.433) at epoch 1 with 88.8% validation accuracy. By epoch 10, training accuracy reached 98.2% while validation accuracy peaked at 92.2% (epoch 7).
213
+ - **InceptionV3** achieved its best validation loss (0.578) at epoch 4. Validation accuracy plateaued around 89-90% while training accuracy continued to ~99.3%, indicating overfitting after ~15 epochs.
214
+ - **MobileNetV3-Large** achieved its best validation loss (0.474) at epoch 5. It trained the longest (79 epochs), reaching 99.7% training accuracy while validation accuracy stabilized around 90-92%.
215
+
216
  ### Hyperparameters
217
 
218
  - **Optimizer:** Adam (lr=1e-4)
 
223
  - **Class balancing:** WeightedRandomSampler (inverse class frequency)
224
  - **Input resolution:** 528x528 (resize to 572, then center crop for val/test)
225
 
226
+ ### Learning Rate Decay
227
+
228
+ The ReduceLROnPlateau scheduler reduced the learning rate by 0.9x each time validation loss failed to improve for 5 consecutive epochs:
229
+
230
+ | Model | Starting LR | Final LR | LR Reductions |
231
+ |-------|:-----------:|:--------:|:-------------:|
232
+ | EfficientNet-B6 | 1e-4 | 9e-5 | 1 |
233
+ | InceptionV3 | 1e-4 | 4.8e-5 | 8 |
234
+ | MobileNetV3-Large | 1e-4 | 2.8e-5 | 13 |
235
+
236
  ### Data Augmentation (training only)
237
 
238
  - RandomResizedCrop (528, scale 0.6-1.0)
 
242
  - ColorJitter (brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2)
243
  - ImageNet normalization
244
 
 
 
 
 
 
245
  ## Limitations
246
 
247
  - Models were trained on a specific dataset of cereal pest images and may not generalize well to pest species not in the training set.