sumit7488's picture
Update README.md
ce3306d verified
metadata
license: apache-2.0

πŸ“Š Baseline Model Results (TimeSformer on HMDB51)

The standard TimeSformer model was trained on the HMDB51 dataset for 15 epochs.

Training was performed in multiple stages due to runtime limits and resumed using saved checkpoints.


πŸ” Training Strategy

  • Training conducted on Kaggle GPU
  • Interrupted due to 12-hour session limit
  • Resumed using .safetensors checkpoint
  • Completed full 15 epochs
  • Early stopping applied at final stage

πŸ“ˆ Training Results (Epoch 1–15)

Epoch Train Loss Train Acc Val Loss Val Acc F1
1 3.9314 0.0323 3.7800 0.1248 0.1100
2 3.5892 0.1976 3.2530 0.4353 0.3959
3 3.0235 0.4354 2.6567 0.5340 0.4867
4 2.5720 0.5246 2.2347 0.6033 0.5720
5 2.2836 0.6009 2.0135 0.6327 0.6143
6 2.0897 0.6419 1.8659 0.6536 0.6415
7 1.9678 0.6807 1.8017 0.6667 0.6561
8 1.8847 0.7030 1.7705 0.6627 0.6485
9 1.8213 0.7237 1.7355 0.6627 0.6518
10 1.7576 0.7414 1.7340 0.6680 0.6570
11 1.7098 0.7549 1.7234 0.6765 0.6699
12 1.6813 0.7660 1.6980 0.6895 0.6846
13 1.6731 0.7693 1.7023 0.6882 0.6824
14 1.6384 0.7778 1.7110 0.6850 0.6806
15 1.6045 0.7884 1.7131 0.6850 0.6805

πŸ† Best Performance

  • Validation Accuracy: 68.95%
  • F1 Score: 0.6846
  • Achieved at Epoch 12

βš™οΈ Training Details

  • Peak GPU Memory: ~9.3 GB
  • Training Time per Epoch: ~55 minutes
  • Evaluation Time: ~8 minutes
  • Mixed Precision Training used
  • Early stopping applied after convergence

πŸ“Œ Observations

  • Strong and stable learning curve
  • Performance improves steadily until Epoch 12
  • After Epoch 12:
    • Validation accuracy plateaus
    • Validation loss increases slightly β†’ overfitting begins

⚑ Key Insight

  • TimeSformer achieves higher accuracy (~68.95%) on HMDB51
  • However, it requires:
    • ❌ Higher memory (~9.3 GB)
    • ❌ Higher computational cost

πŸ” Conclusion (Baseline)

The TimeSformer model delivers strong performance on HMDB51 but at a significantly higher computational cost, highlighting the need for more efficient architectures such as the proposed RetNet-based hybrid model.