| --- |
| license: apache-2.0 |
| --- |
| |
| ## π Baseline Model Results (TimeSformer on HMDB51) |
|
|
| The standard **TimeSformer model** was trained on the **HMDB51 dataset** for 15 epochs. |
|
|
| Training was performed in multiple stages due to runtime limits and resumed using saved checkpoints. |
|
|
| --- |
|
|
| ## π Training Strategy |
|
|
| - Training conducted on Kaggle GPU |
| - Interrupted due to 12-hour session limit |
| - Resumed using `.safetensors` checkpoint |
| - Completed full **15 epochs** |
| - Early stopping applied at final stage |
|
|
| --- |
|
|
| ## π Training Results (Epoch 1β15) |
|
|
| | Epoch | Train Loss | Train Acc | Val Loss | Val Acc | F1 | |
| |------|------------|-----------|----------|---------|-----| |
| | 1 | 3.9314 | 0.0323 | 3.7800 | 0.1248 | 0.1100 | |
| | 2 | 3.5892 | 0.1976 | 3.2530 | 0.4353 | 0.3959 | |
| | 3 | 3.0235 | 0.4354 | 2.6567 | 0.5340 | 0.4867 | |
| | 4 | 2.5720 | 0.5246 | 2.2347 | 0.6033 | 0.5720 | |
| | 5 | 2.2836 | 0.6009 | 2.0135 | 0.6327 | 0.6143 | |
| | 6 | 2.0897 | 0.6419 | 1.8659 | 0.6536 | 0.6415 | |
| | 7 | 1.9678 | 0.6807 | 1.8017 | 0.6667 | 0.6561 | |
| | 8 | 1.8847 | 0.7030 | 1.7705 | 0.6627 | 0.6485 | |
| | 9 | 1.8213 | 0.7237 | 1.7355 | 0.6627 | 0.6518 | |
| | 10 | 1.7576 | 0.7414 | 1.7340 | 0.6680 | 0.6570 | |
| | 11 | 1.7098 | 0.7549 | 1.7234 | 0.6765 | 0.6699 | |
| | 12 | 1.6813 | 0.7660 | 1.6980 | **0.6895** | **0.6846** | |
| | 13 | 1.6731 | 0.7693 | 1.7023 | 0.6882 | 0.6824 | |
| | 14 | 1.6384 | 0.7778 | 1.7110 | 0.6850 | 0.6806 | |
| | 15 | 1.6045 | 0.7884 | 1.7131 | 0.6850 | 0.6805 | |
|
|
| --- |
|
|
| ## π Best Performance |
|
|
| - **Validation Accuracy:** **68.95%** |
| - **F1 Score:** 0.6846 |
| - Achieved at **Epoch 12** |
|
|
| --- |
|
|
| ## βοΈ Training Details |
|
|
| - Peak GPU Memory: **~9.3 GB** |
| - Training Time per Epoch: ~55 minutes |
| - Evaluation Time: ~8 minutes |
| - Mixed Precision Training used |
| - Early stopping applied after convergence |
|
|
| --- |
|
|
| ## π Observations |
|
|
| - Strong and stable learning curve |
| - Performance improves steadily until **Epoch 12** |
| - After Epoch 12: |
| - Validation accuracy plateaus |
| - Validation loss increases slightly β **overfitting begins** |
|
|
| --- |
|
|
| ## β‘ Key Insight |
|
|
| - TimeSformer achieves **higher accuracy (~68.95%)** on HMDB51 |
| - However, it requires: |
| - β Higher memory (~9.3 GB) |
| - β Higher computational cost |
|
|
| --- |
|
|
| ## π Conclusion (Baseline) |
|
|
| The TimeSformer model delivers strong performance on HMDB51 but at a significantly higher computational cost, highlighting the need for more efficient architectures such as the proposed RetNet-based hybrid model. |
|
|