sumit7488's picture
Update README.md
a5ecf61 verified
---
license: mit
tags:
- video-classification
- timesformer
- action-recognition
- hmdb51
- pytorch
datasets:
- hmdb51
---
# 🎬 TimeSformer Fine-Tuned on HMDB51
This model is a fine-tuned version of **TimeSformer (Time-Space Transformer)** for **video action recognition**, trained on the **HMDB51 dataset**.
---
## πŸ“Œ Model Overview
- **Architecture:** TimeSformer (Transformer-based video model)
- **Task:** Video Classification / Action Recognition
- **Dataset:** HMDB51 (51 human action classes)
- **Framework:** PyTorch + Hugging Face Transformers
- **Training Device:** CUDA (GPU)
This model learns spatial and temporal features from video frames using transformer-based attention mechanisms.
---
## πŸ“Š Training Details
- **Epochs:** 15
- **Training Strategy:**
- Mixed Precision Training (`torch.cuda.amp`)
- Early stopping based on validation performance
- Best model checkpoint saved during training
---
## πŸ“ˆ Training Results
| Epoch | Train Loss | Train Accuracy | Val Loss | Val Accuracy |
|------|------------|----------------|----------|--------------|
| 1 | 3.9314 | 0.0323 | 3.7800 | 0.1248 |
| 2 | 3.5892 | 0.1976 | 3.2530 | 0.4353 |
| 3 | 3.0235 | 0.4354 | 2.6567 | 0.5340 |
| 4 | 2.5720 | 0.5246 | 2.2347 | 0.6033 |
| 5 | 2.2836 | 0.6009 | 2.0135 | 0.6327 |
| 6 | 2.0897 | 0.6419 | 1.8659 | 0.6536 |
| 7 | 1.9678 | 0.6807 | 1.8017 | 0.6667 |
| 8 | 1.8847 | 0.7030 | 1.7705 | 0.6627 |
| 9 | 1.8213 | 0.7237 | 1.7355 | 0.6627 |
| 10 | 1.7576 | 0.7414 | 1.7340 | 0.6680 |
| 11 | 1.7098 | 0.7549 | 1.7234 | 0.6765 |
### βœ… Best Validation Accuracy: **67.65%**
- Achieved at **Epoch 11**
- Model checkpoint saved automatically
---
## 🧠 Dataset
**HMDB51** is a widely used human action recognition dataset:
- 51 action classes
- Real-world videos
- Diverse motion patterns and scenes
---
## πŸš€ Usage
### Install Dependencies
```bash
pip install torch torchvision transformers