File size: 2,442 Bytes

ce3306d

---
license: apache-2.0
---

## 📊 Baseline Model Results (TimeSformer on HMDB51)

The standard **TimeSformer model** was trained on the **HMDB51 dataset** for 15 epochs.

Training was performed in multiple stages due to runtime limits and resumed using saved checkpoints.

---

## 🔁 Training Strategy

- Training conducted on Kaggle GPU
- Interrupted due to 12-hour session limit
- Resumed using `.safetensors` checkpoint
- Completed full **15 epochs**
- Early stopping applied at final stage

---

## 📈 Training Results (Epoch 1–15)

| Epoch | Train Loss | Train Acc | Val Loss | Val Acc | F1 |
|------|------------|-----------|----------|---------|-----|
| 1 | 3.9314 | 0.0323 | 3.7800 | 0.1248 | 0.1100 |
| 2 | 3.5892 | 0.1976 | 3.2530 | 0.4353 | 0.3959 |
| 3 | 3.0235 | 0.4354 | 2.6567 | 0.5340 | 0.4867 |
| 4 | 2.5720 | 0.5246 | 2.2347 | 0.6033 | 0.5720 |
| 5 | 2.2836 | 0.6009 | 2.0135 | 0.6327 | 0.6143 |
| 6 | 2.0897 | 0.6419 | 1.8659 | 0.6536 | 0.6415 |
| 7 | 1.9678 | 0.6807 | 1.8017 | 0.6667 | 0.6561 |
| 8 | 1.8847 | 0.7030 | 1.7705 | 0.6627 | 0.6485 |
| 9 | 1.8213 | 0.7237 | 1.7355 | 0.6627 | 0.6518 |
| 10 | 1.7576 | 0.7414 | 1.7340 | 0.6680 | 0.6570 |
| 11 | 1.7098 | 0.7549 | 1.7234 | 0.6765 | 0.6699 |
| 12 | 1.6813 | 0.7660 | 1.6980 | **0.6895** | **0.6846** |
| 13 | 1.6731 | 0.7693 | 1.7023 | 0.6882 | 0.6824 |
| 14 | 1.6384 | 0.7778 | 1.7110 | 0.6850 | 0.6806 |
| 15 | 1.6045 | 0.7884 | 1.7131 | 0.6850 | 0.6805 |

---

## 🏆 Best Performance

- **Validation Accuracy:** **68.95%**
- **F1 Score:** 0.6846  
- Achieved at **Epoch 12**

---

## ⚙️ Training Details

- Peak GPU Memory: **~9.3 GB**
- Training Time per Epoch: ~55 minutes
- Evaluation Time: ~8 minutes
- Mixed Precision Training used
- Early stopping applied after convergence

---

## 📌 Observations

- Strong and stable learning curve
- Performance improves steadily until **Epoch 12**
- After Epoch 12:
  - Validation accuracy plateaus
  - Validation loss increases slightly → **overfitting begins**

---

## ⚡ Key Insight

- TimeSformer achieves **higher accuracy (~68.95%)** on HMDB51  
- However, it requires:
  - ❌ Higher memory (~9.3 GB)
  - ❌ Higher computational cost  

---

## 🔍 Conclusion (Baseline)

The TimeSformer model delivers strong performance on HMDB51 but at a significantly higher computational cost, highlighting the need for more efficient architectures such as the proposed RetNet-based hybrid model.