sumit7488's picture
Update README.md
a5ecf61 verified
metadata
license: mit
tags:
  - video-classification
  - timesformer
  - action-recognition
  - hmdb51
  - pytorch
datasets:
  - hmdb51

🎬 TimeSformer Fine-Tuned on HMDB51

This model is a fine-tuned version of TimeSformer (Time-Space Transformer) for video action recognition, trained on the HMDB51 dataset.


πŸ“Œ Model Overview

  • Architecture: TimeSformer (Transformer-based video model)
  • Task: Video Classification / Action Recognition
  • Dataset: HMDB51 (51 human action classes)
  • Framework: PyTorch + Hugging Face Transformers
  • Training Device: CUDA (GPU)

This model learns spatial and temporal features from video frames using transformer-based attention mechanisms.


πŸ“Š Training Details

  • Epochs: 15
  • Training Strategy:
    • Mixed Precision Training (torch.cuda.amp)
    • Early stopping based on validation performance
    • Best model checkpoint saved during training

πŸ“ˆ Training Results

Epoch Train Loss Train Accuracy Val Loss Val Accuracy
1 3.9314 0.0323 3.7800 0.1248
2 3.5892 0.1976 3.2530 0.4353
3 3.0235 0.4354 2.6567 0.5340
4 2.5720 0.5246 2.2347 0.6033
5 2.2836 0.6009 2.0135 0.6327
6 2.0897 0.6419 1.8659 0.6536
7 1.9678 0.6807 1.8017 0.6667
8 1.8847 0.7030 1.7705 0.6627
9 1.8213 0.7237 1.7355 0.6627
10 1.7576 0.7414 1.7340 0.6680
11 1.7098 0.7549 1.7234 0.6765

βœ… Best Validation Accuracy: 67.65%

  • Achieved at Epoch 11
  • Model checkpoint saved automatically

🧠 Dataset

HMDB51 is a widely used human action recognition dataset:

  • 51 action classes
  • Real-world videos
  • Diverse motion patterns and scenes

πŸš€ Usage

Install Dependencies

pip install torch torchvision transformers