---
license: mit
tags:
- video-classification
- timesformer
- action-recognition
- ucf101
- pytorch
datasets:
- ucf101
---

# 🎬 TimeSformer Fine-Tuned for Video Action Recognition

This model is a fine-tuned version of **TimeSformer (Time-Space Transformer)** for **video action recognition**, trained on the **UCF101 dataset**.

---

## 📌 Model Overview

- **Base Model:** facebook/timesformer-base-finetuned-k400  
- **Task:** Video Classification / Action Recognition  
- **Dataset:** UCF101 (101 action classes)  
- **Framework:** PyTorch + Hugging Face Transformers  
- **Training Environment:** Kaggle (GPU)

---

## 🧠 Training Strategy

Due to Kaggle’s **12-hour session limit**, training was performed in **multiple stages**:

1. Initial training run
2. Checkpoint saving (best model)
3. Resume training from best checkpoint
4. Further fine-tuning across sessions

This approach ensures efficient long training without losing progress.

---

## 📊 Training Results

### 🔹 Initial Training

| Epoch | Train Loss | Train Acc | Val Loss | Val Acc |
|------|------------|-----------|----------|---------|
| 1 | 4.5066 | 0.0622 | 4.1089 | 0.4245 |
| 2 | 3.5721 | 0.4711 | 2.5276 | 0.8007 |
| 3 | 2.3239 | 0.7323 | 1.4321 | 0.8993 |

---

### 🔹 Continued Training (Checkpoint Resume)

| Epoch | Train Loss | Train Acc | Val Loss | Val Acc |
|------|------------|-----------|----------|---------|
| 4 | 1.8289 | 0.7991 | 1.1802 | 0.9199 |
| 5 | 1.7119 | 0.8094 | 1.1372 | 0.9128 |
| 6 | 1.6365 | 0.8153 | 1.1085 | 0.9191 |
| 7 | 1.5982 | 0.8139 | 1.0868 | 0.9218 |
| 8 | 1.5053 | 0.8194 | 1.0763 | **0.9262** |
| 9 | 1.4673 | 0.8201 | 1.0824 | 0.9225 |

---

## 🏆 Best Performance

- **Best Validation Accuracy:** **92.62%**
- **F1 Score:** 0.9244  
- **Precision:** 0.9315  
- **Recall:** 0.9262  
- Achieved at **Epoch 8**

---

## 📈 Additional Metrics

| Metric | Value |
|-------|------|
| Precision | 0.9315 |
| Recall | 0.9262 |
| F1 Score | 0.9244 |

---

## ⚙️ Training Details

- Mixed Precision Training (`torch.cuda.amp`)
- GPU Memory Usage: ~9.3–9.8 GB
- Training Time per Epoch: ~2.5 hours
- Evaluation Time per Epoch: ~20 minutes
- Best model checkpoint saved automatically

---

## 🚀 Usage

### Install Dependencies

```bash
pip install torch torchvision transformers