🎬 TimeSformer Fine-Tuned on HMDB51

This model is a fine-tuned version of TimeSformer (Time-Space Transformer) for video action recognition, trained on the HMDB51 dataset.


πŸ“Œ Model Overview

  • Architecture: TimeSformer (Transformer-based video model)
  • Task: Video Classification / Action Recognition
  • Dataset: HMDB51 (51 human action classes)
  • Framework: PyTorch + Hugging Face Transformers
  • Training Device: CUDA (GPU)

This model learns spatial and temporal features from video frames using transformer-based attention mechanisms.


πŸ“Š Training Details

  • Epochs: 15
  • Training Strategy:
    • Mixed Precision Training (torch.cuda.amp)
    • Early stopping based on validation performance
    • Best model checkpoint saved during training

πŸ“ˆ Training Results

Epoch Train Loss Train Accuracy Val Loss Val Accuracy
1 3.9314 0.0323 3.7800 0.1248
2 3.5892 0.1976 3.2530 0.4353
3 3.0235 0.4354 2.6567 0.5340
4 2.5720 0.5246 2.2347 0.6033
5 2.2836 0.6009 2.0135 0.6327
6 2.0897 0.6419 1.8659 0.6536
7 1.9678 0.6807 1.8017 0.6667
8 1.8847 0.7030 1.7705 0.6627
9 1.8213 0.7237 1.7355 0.6627
10 1.7576 0.7414 1.7340 0.6680
11 1.7098 0.7549 1.7234 0.6765

βœ… Best Validation Accuracy: 67.65%

  • Achieved at Epoch 11
  • Model checkpoint saved automatically

🧠 Dataset

HMDB51 is a widely used human action recognition dataset:

  • 51 action classes
  • Real-world videos
  • Diverse motion patterns and scenes

πŸš€ Usage

Install Dependencies

pip install torch torchvision transformers
Downloads last month
28
Safetensors
Model size
0.1B params
Tensor type
F64
Β·
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support