| --- |
| license: mit |
| tags: |
| - video-classification |
| - timesformer |
| - action-recognition |
| - ucf101 |
| - pytorch |
| datasets: |
| - ucf101 |
| --- |
| |
| # π¬ TimeSformer Fine-Tuned for Video Action Recognition |
|
|
| This model is a fine-tuned version of **TimeSformer (Time-Space Transformer)** for **video action recognition**, trained on the **UCF101 dataset**. |
|
|
| --- |
|
|
| ## π Model Overview |
|
|
| - **Base Model:** facebook/timesformer-base-finetuned-k400 |
| - **Task:** Video Classification / Action Recognition |
| - **Dataset:** UCF101 (101 action classes) |
| - **Framework:** PyTorch + Hugging Face Transformers |
| - **Training Environment:** Kaggle (GPU) |
|
|
| --- |
|
|
| ## π§ Training Strategy |
|
|
| Due to Kaggleβs **12-hour session limit**, training was performed in **multiple stages**: |
|
|
| 1. Initial training run |
| 2. Checkpoint saving (best model) |
| 3. Resume training from best checkpoint |
| 4. Further fine-tuning across sessions |
|
|
| This approach ensures efficient long training without losing progress. |
|
|
| --- |
|
|
| ## π Training Results |
|
|
| ### πΉ Initial Training |
|
|
| | Epoch | Train Loss | Train Acc | Val Loss | Val Acc | |
| |------|------------|-----------|----------|---------| |
| | 1 | 4.5066 | 0.0622 | 4.1089 | 0.4245 | |
| | 2 | 3.5721 | 0.4711 | 2.5276 | 0.8007 | |
| | 3 | 2.3239 | 0.7323 | 1.4321 | 0.8993 | |
|
|
| --- |
|
|
| ### πΉ Continued Training (Checkpoint Resume) |
|
|
| | Epoch | Train Loss | Train Acc | Val Loss | Val Acc | |
| |------|------------|-----------|----------|---------| |
| | 4 | 1.8289 | 0.7991 | 1.1802 | 0.9199 | |
| | 5 | 1.7119 | 0.8094 | 1.1372 | 0.9128 | |
| | 6 | 1.6365 | 0.8153 | 1.1085 | 0.9191 | |
| | 7 | 1.5982 | 0.8139 | 1.0868 | 0.9218 | |
| | 8 | 1.5053 | 0.8194 | 1.0763 | **0.9262** | |
| | 9 | 1.4673 | 0.8201 | 1.0824 | 0.9225 | |
|
|
| --- |
|
|
| ## π Best Performance |
|
|
| - **Best Validation Accuracy:** **92.62%** |
| - **F1 Score:** 0.9244 |
| - **Precision:** 0.9315 |
| - **Recall:** 0.9262 |
| - Achieved at **Epoch 8** |
|
|
| --- |
|
|
| ## π Additional Metrics |
|
|
| | Metric | Value | |
| |-------|------| |
| | Precision | 0.9315 | |
| | Recall | 0.9262 | |
| | F1 Score | 0.9244 | |
|
|
| --- |
|
|
| ## βοΈ Training Details |
|
|
| - Mixed Precision Training (`torch.cuda.amp`) |
| - GPU Memory Usage: ~9.3β9.8 GB |
| - Training Time per Epoch: ~2.5 hours |
| - Evaluation Time per Epoch: ~20 minutes |
| - Best model checkpoint saved automatically |
|
|
| --- |
|
|
| ## π Usage |
|
|
| ### Install Dependencies |
|
|
| ```bash |
| pip install torch torchvision transformers |
| |