Update README.md

a5ecf61 verified 14 days ago

2.18 kB

license: mit
tags:
  - video-classification
  - timesformer
  - action-recognition
  - hmdb51
  - pytorch
datasets:
  - hmdb51

🎬 TimeSformer Fine-Tuned on HMDB51

This model is a fine-tuned version of TimeSformer (Time-Space Transformer) for video action recognition, trained on the HMDB51 dataset.

📌 Model Overview

This model learns spatial and temporal features from video frames using transformer-based attention mechanisms.

Epochs: 15
Training Strategy:
- Mixed Precision Training (torch.cuda.amp)
- Early stopping based on validation performance
- Best model checkpoint saved during training

Epoch	Train Loss	Train Accuracy	Val Loss	Val Accuracy
1	3.9314	0.0323	3.7800	0.1248
2	3.5892	0.1976	3.2530	0.4353
3	3.0235	0.4354	2.6567	0.5340
4	2.5720	0.5246	2.2347	0.6033
5	2.2836	0.6009	2.0135	0.6327
6	2.0897	0.6419	1.8659	0.6536
7	1.9678	0.6807	1.8017	0.6667
8	1.8847	0.7030	1.7705	0.6627
9	1.8213	0.7237	1.7355	0.6627
10	1.7576	0.7414	1.7340	0.6680
11	1.7098	0.7549	1.7234	0.6765

HMDB51 is a widely used human action recognition dataset:

pip install torch torchvision transformers