File size: 2,318 Bytes
603747e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | ---
license: mit
tags:
- video-classification
- timesformer
- action-recognition
- ucf101
- pytorch
datasets:
- ucf101
---
# ๐ฌ TimeSformer Fine-Tuned for Video Action Recognition
This model is a fine-tuned version of **TimeSformer (Time-Space Transformer)** for **video action recognition**, trained on the **UCF101 dataset**.
---
## ๐ Model Overview
- **Base Model:** facebook/timesformer-base-finetuned-k400
- **Task:** Video Classification / Action Recognition
- **Dataset:** UCF101 (101 action classes)
- **Framework:** PyTorch + Hugging Face Transformers
- **Training Environment:** Kaggle (GPU)
---
## ๐ง Training Strategy
Due to Kaggleโs **12-hour session limit**, training was performed in **multiple stages**:
1. Initial training run
2. Checkpoint saving (best model)
3. Resume training from best checkpoint
4. Further fine-tuning across sessions
This approach ensures efficient long training without losing progress.
---
## ๐ Training Results
### ๐น Initial Training
| Epoch | Train Loss | Train Acc | Val Loss | Val Acc |
|------|------------|-----------|----------|---------|
| 1 | 4.5066 | 0.0622 | 4.1089 | 0.4245 |
| 2 | 3.5721 | 0.4711 | 2.5276 | 0.8007 |
| 3 | 2.3239 | 0.7323 | 1.4321 | 0.8993 |
---
### ๐น Continued Training (Checkpoint Resume)
| Epoch | Train Loss | Train Acc | Val Loss | Val Acc |
|------|------------|-----------|----------|---------|
| 4 | 1.8289 | 0.7991 | 1.1802 | 0.9199 |
| 5 | 1.7119 | 0.8094 | 1.1372 | 0.9128 |
| 6 | 1.6365 | 0.8153 | 1.1085 | 0.9191 |
| 7 | 1.5982 | 0.8139 | 1.0868 | 0.9218 |
| 8 | 1.5053 | 0.8194 | 1.0763 | **0.9262** |
| 9 | 1.4673 | 0.8201 | 1.0824 | 0.9225 |
---
## ๐ Best Performance
- **Best Validation Accuracy:** **92.62%**
- **F1 Score:** 0.9244
- **Precision:** 0.9315
- **Recall:** 0.9262
- Achieved at **Epoch 8**
---
## ๐ Additional Metrics
| Metric | Value |
|-------|------|
| Precision | 0.9315 |
| Recall | 0.9262 |
| F1 Score | 0.9244 |
---
## โ๏ธ Training Details
- Mixed Precision Training (`torch.cuda.amp`)
- GPU Memory Usage: ~9.3โ9.8 GB
- Training Time per Epoch: ~2.5 hours
- Evaluation Time per Epoch: ~20 minutes
- Best model checkpoint saved automatically
---
## ๐ Usage
### Install Dependencies
```bash
pip install torch torchvision transformers
|