--- license: mit tags: - video-classification - timesformer - action-recognition - ucf101 - pytorch datasets: - ucf101 --- # 🎬 TimeSformer Fine-Tuned for Video Action Recognition This model is a fine-tuned version of **TimeSformer (Time-Space Transformer)** for **video action recognition**, trained on the **UCF101 dataset**. --- ## πŸ“Œ Model Overview - **Base Model:** facebook/timesformer-base-finetuned-k400 - **Task:** Video Classification / Action Recognition - **Dataset:** UCF101 (101 action classes) - **Framework:** PyTorch + Hugging Face Transformers - **Training Environment:** Kaggle (GPU) --- ## 🧠 Training Strategy Due to Kaggle’s **12-hour session limit**, training was performed in **multiple stages**: 1. Initial training run 2. Checkpoint saving (best model) 3. Resume training from best checkpoint 4. Further fine-tuning across sessions This approach ensures efficient long training without losing progress. --- ## πŸ“Š Training Results ### πŸ”Ή Initial Training | Epoch | Train Loss | Train Acc | Val Loss | Val Acc | |------|------------|-----------|----------|---------| | 1 | 4.5066 | 0.0622 | 4.1089 | 0.4245 | | 2 | 3.5721 | 0.4711 | 2.5276 | 0.8007 | | 3 | 2.3239 | 0.7323 | 1.4321 | 0.8993 | --- ### πŸ”Ή Continued Training (Checkpoint Resume) | Epoch | Train Loss | Train Acc | Val Loss | Val Acc | |------|------------|-----------|----------|---------| | 4 | 1.8289 | 0.7991 | 1.1802 | 0.9199 | | 5 | 1.7119 | 0.8094 | 1.1372 | 0.9128 | | 6 | 1.6365 | 0.8153 | 1.1085 | 0.9191 | | 7 | 1.5982 | 0.8139 | 1.0868 | 0.9218 | | 8 | 1.5053 | 0.8194 | 1.0763 | **0.9262** | | 9 | 1.4673 | 0.8201 | 1.0824 | 0.9225 | --- ## πŸ† Best Performance - **Best Validation Accuracy:** **92.62%** - **F1 Score:** 0.9244 - **Precision:** 0.9315 - **Recall:** 0.9262 - Achieved at **Epoch 8** --- ## πŸ“ˆ Additional Metrics | Metric | Value | |-------|------| | Precision | 0.9315 | | Recall | 0.9262 | | F1 Score | 0.9244 | --- ## βš™οΈ Training Details - Mixed Precision Training (`torch.cuda.amp`) - GPU Memory Usage: ~9.3–9.8 GB - Training Time per Epoch: ~2.5 hours - Evaluation Time per Epoch: ~20 minutes - Best model checkpoint saved automatically --- ## πŸš€ Usage ### Install Dependencies ```bash pip install torch torchvision transformers