--- license: apache-2.0 --- ## 📊 Baseline Model Results (TimeSformer on HMDB51) The standard **TimeSformer model** was trained on the **HMDB51 dataset** for 15 epochs. Training was performed in multiple stages due to runtime limits and resumed using saved checkpoints. --- ## 🔁 Training Strategy - Training conducted on Kaggle GPU - Interrupted due to 12-hour session limit - Resumed using `.safetensors` checkpoint - Completed full **15 epochs** - Early stopping applied at final stage --- ## 📈 Training Results (Epoch 1–15) | Epoch | Train Loss | Train Acc | Val Loss | Val Acc | F1 | |------|------------|-----------|----------|---------|-----| | 1 | 3.9314 | 0.0323 | 3.7800 | 0.1248 | 0.1100 | | 2 | 3.5892 | 0.1976 | 3.2530 | 0.4353 | 0.3959 | | 3 | 3.0235 | 0.4354 | 2.6567 | 0.5340 | 0.4867 | | 4 | 2.5720 | 0.5246 | 2.2347 | 0.6033 | 0.5720 | | 5 | 2.2836 | 0.6009 | 2.0135 | 0.6327 | 0.6143 | | 6 | 2.0897 | 0.6419 | 1.8659 | 0.6536 | 0.6415 | | 7 | 1.9678 | 0.6807 | 1.8017 | 0.6667 | 0.6561 | | 8 | 1.8847 | 0.7030 | 1.7705 | 0.6627 | 0.6485 | | 9 | 1.8213 | 0.7237 | 1.7355 | 0.6627 | 0.6518 | | 10 | 1.7576 | 0.7414 | 1.7340 | 0.6680 | 0.6570 | | 11 | 1.7098 | 0.7549 | 1.7234 | 0.6765 | 0.6699 | | 12 | 1.6813 | 0.7660 | 1.6980 | **0.6895** | **0.6846** | | 13 | 1.6731 | 0.7693 | 1.7023 | 0.6882 | 0.6824 | | 14 | 1.6384 | 0.7778 | 1.7110 | 0.6850 | 0.6806 | | 15 | 1.6045 | 0.7884 | 1.7131 | 0.6850 | 0.6805 | --- ## 🏆 Best Performance - **Validation Accuracy:** **68.95%** - **F1 Score:** 0.6846 - Achieved at **Epoch 12** --- ## ⚙️ Training Details - Peak GPU Memory: **~9.3 GB** - Training Time per Epoch: ~55 minutes - Evaluation Time: ~8 minutes - Mixed Precision Training used - Early stopping applied after convergence --- ## 📌 Observations - Strong and stable learning curve - Performance improves steadily until **Epoch 12** - After Epoch 12: - Validation accuracy plateaus - Validation loss increases slightly → **overfitting begins** --- ## ⚡ Key Insight - TimeSformer achieves **higher accuracy (~68.95%)** on HMDB51 - However, it requires: - ❌ Higher memory (~9.3 GB) - ❌ Higher computational cost --- ## 🔍 Conclusion (Baseline) The TimeSformer model delivers strong performance on HMDB51 but at a significantly higher computational cost, highlighting the need for more efficient architectures such as the proposed RetNet-based hybrid model.