sumit7488's picture
Update README.md
ce3306d verified
---
license: apache-2.0
---
## πŸ“Š Baseline Model Results (TimeSformer on HMDB51)
The standard **TimeSformer model** was trained on the **HMDB51 dataset** for 15 epochs.
Training was performed in multiple stages due to runtime limits and resumed using saved checkpoints.
---
## πŸ” Training Strategy
- Training conducted on Kaggle GPU
- Interrupted due to 12-hour session limit
- Resumed using `.safetensors` checkpoint
- Completed full **15 epochs**
- Early stopping applied at final stage
---
## πŸ“ˆ Training Results (Epoch 1–15)
| Epoch | Train Loss | Train Acc | Val Loss | Val Acc | F1 |
|------|------------|-----------|----------|---------|-----|
| 1 | 3.9314 | 0.0323 | 3.7800 | 0.1248 | 0.1100 |
| 2 | 3.5892 | 0.1976 | 3.2530 | 0.4353 | 0.3959 |
| 3 | 3.0235 | 0.4354 | 2.6567 | 0.5340 | 0.4867 |
| 4 | 2.5720 | 0.5246 | 2.2347 | 0.6033 | 0.5720 |
| 5 | 2.2836 | 0.6009 | 2.0135 | 0.6327 | 0.6143 |
| 6 | 2.0897 | 0.6419 | 1.8659 | 0.6536 | 0.6415 |
| 7 | 1.9678 | 0.6807 | 1.8017 | 0.6667 | 0.6561 |
| 8 | 1.8847 | 0.7030 | 1.7705 | 0.6627 | 0.6485 |
| 9 | 1.8213 | 0.7237 | 1.7355 | 0.6627 | 0.6518 |
| 10 | 1.7576 | 0.7414 | 1.7340 | 0.6680 | 0.6570 |
| 11 | 1.7098 | 0.7549 | 1.7234 | 0.6765 | 0.6699 |
| 12 | 1.6813 | 0.7660 | 1.6980 | **0.6895** | **0.6846** |
| 13 | 1.6731 | 0.7693 | 1.7023 | 0.6882 | 0.6824 |
| 14 | 1.6384 | 0.7778 | 1.7110 | 0.6850 | 0.6806 |
| 15 | 1.6045 | 0.7884 | 1.7131 | 0.6850 | 0.6805 |
---
## πŸ† Best Performance
- **Validation Accuracy:** **68.95%**
- **F1 Score:** 0.6846
- Achieved at **Epoch 12**
---
## βš™οΈ Training Details
- Peak GPU Memory: **~9.3 GB**
- Training Time per Epoch: ~55 minutes
- Evaluation Time: ~8 minutes
- Mixed Precision Training used
- Early stopping applied after convergence
---
## πŸ“Œ Observations
- Strong and stable learning curve
- Performance improves steadily until **Epoch 12**
- After Epoch 12:
- Validation accuracy plateaus
- Validation loss increases slightly β†’ **overfitting begins**
---
## ⚑ Key Insight
- TimeSformer achieves **higher accuracy (~68.95%)** on HMDB51
- However, it requires:
- ❌ Higher memory (~9.3 GB)
- ❌ Higher computational cost
---
## πŸ” Conclusion (Baseline)
The TimeSformer model delivers strong performance on HMDB51 but at a significantly higher computational cost, highlighting the need for more efficient architectures such as the proposed RetNet-based hybrid model.