metadata
license: apache-2.0
π Baseline Model Results (TimeSformer on HMDB51)
The standard TimeSformer model was trained on the HMDB51 dataset for 15 epochs.
Training was performed in multiple stages due to runtime limits and resumed using saved checkpoints.
π Training Strategy
- Training conducted on Kaggle GPU
- Interrupted due to 12-hour session limit
- Resumed using
.safetensorscheckpoint - Completed full 15 epochs
- Early stopping applied at final stage
π Training Results (Epoch 1β15)
| Epoch | Train Loss | Train Acc | Val Loss | Val Acc | F1 |
|---|---|---|---|---|---|
| 1 | 3.9314 | 0.0323 | 3.7800 | 0.1248 | 0.1100 |
| 2 | 3.5892 | 0.1976 | 3.2530 | 0.4353 | 0.3959 |
| 3 | 3.0235 | 0.4354 | 2.6567 | 0.5340 | 0.4867 |
| 4 | 2.5720 | 0.5246 | 2.2347 | 0.6033 | 0.5720 |
| 5 | 2.2836 | 0.6009 | 2.0135 | 0.6327 | 0.6143 |
| 6 | 2.0897 | 0.6419 | 1.8659 | 0.6536 | 0.6415 |
| 7 | 1.9678 | 0.6807 | 1.8017 | 0.6667 | 0.6561 |
| 8 | 1.8847 | 0.7030 | 1.7705 | 0.6627 | 0.6485 |
| 9 | 1.8213 | 0.7237 | 1.7355 | 0.6627 | 0.6518 |
| 10 | 1.7576 | 0.7414 | 1.7340 | 0.6680 | 0.6570 |
| 11 | 1.7098 | 0.7549 | 1.7234 | 0.6765 | 0.6699 |
| 12 | 1.6813 | 0.7660 | 1.6980 | 0.6895 | 0.6846 |
| 13 | 1.6731 | 0.7693 | 1.7023 | 0.6882 | 0.6824 |
| 14 | 1.6384 | 0.7778 | 1.7110 | 0.6850 | 0.6806 |
| 15 | 1.6045 | 0.7884 | 1.7131 | 0.6850 | 0.6805 |
π Best Performance
- Validation Accuracy: 68.95%
- F1 Score: 0.6846
- Achieved at Epoch 12
βοΈ Training Details
- Peak GPU Memory: ~9.3 GB
- Training Time per Epoch: ~55 minutes
- Evaluation Time: ~8 minutes
- Mixed Precision Training used
- Early stopping applied after convergence
π Observations
- Strong and stable learning curve
- Performance improves steadily until Epoch 12
- After Epoch 12:
- Validation accuracy plateaus
- Validation loss increases slightly β overfitting begins
β‘ Key Insight
- TimeSformer achieves higher accuracy (~68.95%) on HMDB51
- However, it requires:
- β Higher memory (~9.3 GB)
- β Higher computational cost
π Conclusion (Baseline)
The TimeSformer model delivers strong performance on HMDB51 but at a significantly higher computational cost, highlighting the need for more efficient architectures such as the proposed RetNet-based hybrid model.