π¬ TimeSformer Fine-Tuned on HMDB51
This model is a fine-tuned version of TimeSformer (Time-Space Transformer) for video action recognition, trained on the HMDB51 dataset.
π Model Overview
- Architecture: TimeSformer (Transformer-based video model)
- Task: Video Classification / Action Recognition
- Dataset: HMDB51 (51 human action classes)
- Framework: PyTorch + Hugging Face Transformers
- Training Device: CUDA (GPU)
This model learns spatial and temporal features from video frames using transformer-based attention mechanisms.
π Training Details
- Epochs: 15
- Training Strategy:
- Mixed Precision Training (
torch.cuda.amp) - Early stopping based on validation performance
- Best model checkpoint saved during training
- Mixed Precision Training (
π Training Results
| Epoch | Train Loss | Train Accuracy | Val Loss | Val Accuracy |
|---|---|---|---|---|
| 1 | 3.9314 | 0.0323 | 3.7800 | 0.1248 |
| 2 | 3.5892 | 0.1976 | 3.2530 | 0.4353 |
| 3 | 3.0235 | 0.4354 | 2.6567 | 0.5340 |
| 4 | 2.5720 | 0.5246 | 2.2347 | 0.6033 |
| 5 | 2.2836 | 0.6009 | 2.0135 | 0.6327 |
| 6 | 2.0897 | 0.6419 | 1.8659 | 0.6536 |
| 7 | 1.9678 | 0.6807 | 1.8017 | 0.6667 |
| 8 | 1.8847 | 0.7030 | 1.7705 | 0.6627 |
| 9 | 1.8213 | 0.7237 | 1.7355 | 0.6627 |
| 10 | 1.7576 | 0.7414 | 1.7340 | 0.6680 |
| 11 | 1.7098 | 0.7549 | 1.7234 | 0.6765 |
β Best Validation Accuracy: 67.65%
- Achieved at Epoch 11
- Model checkpoint saved automatically
π§ Dataset
HMDB51 is a widely used human action recognition dataset:
- 51 action classes
- Real-world videos
- Diverse motion patterns and scenes
π Usage
Install Dependencies
pip install torch torchvision transformers
- Downloads last month
- 28
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support