--- license: mit tags: - video-classification - timesformer - action-recognition - hmdb51 - pytorch datasets: - hmdb51 --- # 🎬 TimeSformer Fine-Tuned on HMDB51 This model is a fine-tuned version of **TimeSformer (Time-Space Transformer)** for **video action recognition**, trained on the **HMDB51 dataset**. --- ## 📌 Model Overview - **Architecture:** TimeSformer (Transformer-based video model) - **Task:** Video Classification / Action Recognition - **Dataset:** HMDB51 (51 human action classes) - **Framework:** PyTorch + Hugging Face Transformers - **Training Device:** CUDA (GPU) This model learns spatial and temporal features from video frames using transformer-based attention mechanisms. --- ## 📊 Training Details - **Epochs:** 15 - **Training Strategy:** - Mixed Precision Training (`torch.cuda.amp`) - Early stopping based on validation performance - Best model checkpoint saved during training --- ## 📈 Training Results | Epoch | Train Loss | Train Accuracy | Val Loss | Val Accuracy | |------|------------|----------------|----------|--------------| | 1 | 3.9314 | 0.0323 | 3.7800 | 0.1248 | | 2 | 3.5892 | 0.1976 | 3.2530 | 0.4353 | | 3 | 3.0235 | 0.4354 | 2.6567 | 0.5340 | | 4 | 2.5720 | 0.5246 | 2.2347 | 0.6033 | | 5 | 2.2836 | 0.6009 | 2.0135 | 0.6327 | | 6 | 2.0897 | 0.6419 | 1.8659 | 0.6536 | | 7 | 1.9678 | 0.6807 | 1.8017 | 0.6667 | | 8 | 1.8847 | 0.7030 | 1.7705 | 0.6627 | | 9 | 1.8213 | 0.7237 | 1.7355 | 0.6627 | | 10 | 1.7576 | 0.7414 | 1.7340 | 0.6680 | | 11 | 1.7098 | 0.7549 | 1.7234 | 0.6765 | ### ✅ Best Validation Accuracy: **67.65%** - Achieved at **Epoch 11** - Model checkpoint saved automatically --- ## 🧠 Dataset **HMDB51** is a widely used human action recognition dataset: - 51 action classes - Real-world videos - Diverse motion patterns and scenes --- ## 🚀 Usage ### Install Dependencies ```bash pip install torch torchvision transformers