Update README.md

ce3306d verified 14 days ago

2.44 kB

license: apache-2.0

📊 Baseline Model Results (TimeSformer on HMDB51)

The standard TimeSformer model was trained on the HMDB51 dataset for 15 epochs.

Training was performed in multiple stages due to runtime limits and resumed using saved checkpoints.

🔁 Training Strategy

Training conducted on Kaggle GPU
Interrupted due to 12-hour session limit
Resumed using .safetensors checkpoint
Completed full 15 epochs
Early stopping applied at final stage

📈 Training Results (Epoch 1–15)

Epoch	Train Loss	Train Acc	Val Loss	Val Acc	F1
1	3.9314	0.0323	3.7800	0.1248	0.1100
2	3.5892	0.1976	3.2530	0.4353	0.3959
3	3.0235	0.4354	2.6567	0.5340	0.4867
4	2.5720	0.5246	2.2347	0.6033	0.5720
5	2.2836	0.6009	2.0135	0.6327	0.6143
6	2.0897	0.6419	1.8659	0.6536	0.6415
7	1.9678	0.6807	1.8017	0.6667	0.6561
8	1.8847	0.7030	1.7705	0.6627	0.6485
9	1.8213	0.7237	1.7355	0.6627	0.6518
10	1.7576	0.7414	1.7340	0.6680	0.6570
11	1.7098	0.7549	1.7234	0.6765	0.6699
12	1.6813	0.7660	1.6980	0.6895	0.6846
13	1.6731	0.7693	1.7023	0.6882	0.6824
14	1.6384	0.7778	1.7110	0.6850	0.6806
15	1.6045	0.7884	1.7131	0.6850	0.6805

🏆 Best Performance

Validation Accuracy: 68.95%
F1 Score: 0.6846
Achieved at Epoch 12

⚙️ Training Details

Peak GPU Memory: ~9.3 GB
Training Time per Epoch: ~55 minutes
Evaluation Time: ~8 minutes
Mixed Precision Training used
Early stopping applied after convergence

📌 Observations

Strong and stable learning curve
Performance improves steadily until Epoch 12
After Epoch 12:
- Validation accuracy plateaus
- Validation loss increases slightly → overfitting begins

⚡ Key Insight

TimeSformer achieves higher accuracy (~68.95%) on HMDB51
However, it requires:
- ❌ Higher memory (~9.3 GB)
- ❌ Higher computational cost

🔍 Conclusion (Baseline)

The TimeSformer model delivers strong performance on HMDB51 but at a significantly higher computational cost, highlighting the need for more efficient architectures such as the proposed RetNet-based hybrid model.