sumit7488's picture
Create README.md
603747e verified
---
license: mit
tags:
- video-classification
- timesformer
- action-recognition
- ucf101
- pytorch
datasets:
- ucf101
---
# 🎬 TimeSformer Fine-Tuned for Video Action Recognition
This model is a fine-tuned version of **TimeSformer (Time-Space Transformer)** for **video action recognition**, trained on the **UCF101 dataset**.
---
## πŸ“Œ Model Overview
- **Base Model:** facebook/timesformer-base-finetuned-k400
- **Task:** Video Classification / Action Recognition
- **Dataset:** UCF101 (101 action classes)
- **Framework:** PyTorch + Hugging Face Transformers
- **Training Environment:** Kaggle (GPU)
---
## 🧠 Training Strategy
Due to Kaggle’s **12-hour session limit**, training was performed in **multiple stages**:
1. Initial training run
2. Checkpoint saving (best model)
3. Resume training from best checkpoint
4. Further fine-tuning across sessions
This approach ensures efficient long training without losing progress.
---
## πŸ“Š Training Results
### πŸ”Ή Initial Training
| Epoch | Train Loss | Train Acc | Val Loss | Val Acc |
|------|------------|-----------|----------|---------|
| 1 | 4.5066 | 0.0622 | 4.1089 | 0.4245 |
| 2 | 3.5721 | 0.4711 | 2.5276 | 0.8007 |
| 3 | 2.3239 | 0.7323 | 1.4321 | 0.8993 |
---
### πŸ”Ή Continued Training (Checkpoint Resume)
| Epoch | Train Loss | Train Acc | Val Loss | Val Acc |
|------|------------|-----------|----------|---------|
| 4 | 1.8289 | 0.7991 | 1.1802 | 0.9199 |
| 5 | 1.7119 | 0.8094 | 1.1372 | 0.9128 |
| 6 | 1.6365 | 0.8153 | 1.1085 | 0.9191 |
| 7 | 1.5982 | 0.8139 | 1.0868 | 0.9218 |
| 8 | 1.5053 | 0.8194 | 1.0763 | **0.9262** |
| 9 | 1.4673 | 0.8201 | 1.0824 | 0.9225 |
---
## πŸ† Best Performance
- **Best Validation Accuracy:** **92.62%**
- **F1 Score:** 0.9244
- **Precision:** 0.9315
- **Recall:** 0.9262
- Achieved at **Epoch 8**
---
## πŸ“ˆ Additional Metrics
| Metric | Value |
|-------|------|
| Precision | 0.9315 |
| Recall | 0.9262 |
| F1 Score | 0.9244 |
---
## βš™οΈ Training Details
- Mixed Precision Training (`torch.cuda.amp`)
- GPU Memory Usage: ~9.3–9.8 GB
- Training Time per Epoch: ~2.5 hours
- Evaluation Time per Epoch: ~20 minutes
- Best model checkpoint saved automatically
---
## πŸš€ Usage
### Install Dependencies
```bash
pip install torch torchvision transformers