🎬 TimeSformer Fine-Tuned on HMDB51

This model is a fine-tuned version of TimeSformer (Time-Space Transformer) for video action recognition, trained on the HMDB51 dataset.

📌 Model Overview

This model learns spatial and temporal features from video frames using transformer-based attention mechanisms.

Epochs: 15
Training Strategy:
- Mixed Precision Training (torch.cuda.amp)
- Early stopping based on validation performance
- Best model checkpoint saved during training

Epoch	Train Loss	Train Accuracy	Val Loss	Val Accuracy
1	3.9314	0.0323	3.7800	0.1248
2	3.5892	0.1976	3.2530	0.4353
3	3.0235	0.4354	2.6567	0.5340
4	2.5720	0.5246	2.2347	0.6033
5	2.2836	0.6009	2.0135	0.6327
6	2.0897	0.6419	1.8659	0.6536
7	1.9678	0.6807	1.8017	0.6667
8	1.8847	0.7030	1.7705	0.6627
9	1.8213	0.7237	1.7355	0.6627
10	1.7576	0.7414	1.7340	0.6680
11	1.7098	0.7549	1.7234	0.6765

HMDB51 is a widely used human action recognition dataset:

pip install torch torchvision transformers

Safetensors

Model size

0.1B params

Tensor type

F64

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support