File size: 2,179 Bytes
a5ecf61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
license: mit
tags:
- video-classification
- timesformer
- action-recognition
- hmdb51
- pytorch
datasets:
- hmdb51
---

# 🎬 TimeSformer Fine-Tuned on HMDB51

This model is a fine-tuned version of **TimeSformer (Time-Space Transformer)** for **video action recognition**, trained on the **HMDB51 dataset**.

---

## πŸ“Œ Model Overview

- **Architecture:** TimeSformer (Transformer-based video model)
- **Task:** Video Classification / Action Recognition
- **Dataset:** HMDB51 (51 human action classes)
- **Framework:** PyTorch + Hugging Face Transformers
- **Training Device:** CUDA (GPU)

This model learns spatial and temporal features from video frames using transformer-based attention mechanisms.

---

## πŸ“Š Training Details

- **Epochs:** 15
- **Training Strategy:**
  - Mixed Precision Training (`torch.cuda.amp`)
  - Early stopping based on validation performance
  - Best model checkpoint saved during training

---

## πŸ“ˆ Training Results

| Epoch | Train Loss | Train Accuracy | Val Loss | Val Accuracy |
|------|------------|----------------|----------|--------------|
| 1    | 3.9314     | 0.0323         | 3.7800   | 0.1248       |
| 2    | 3.5892     | 0.1976         | 3.2530   | 0.4353       |
| 3    | 3.0235     | 0.4354         | 2.6567   | 0.5340       |
| 4    | 2.5720     | 0.5246         | 2.2347   | 0.6033       |
| 5    | 2.2836     | 0.6009         | 2.0135   | 0.6327       |
| 6    | 2.0897     | 0.6419         | 1.8659   | 0.6536       |
| 7    | 1.9678     | 0.6807         | 1.8017   | 0.6667       |
| 8    | 1.8847     | 0.7030         | 1.7705   | 0.6627       |
| 9    | 1.8213     | 0.7237         | 1.7355   | 0.6627       |
| 10   | 1.7576     | 0.7414         | 1.7340   | 0.6680       |
| 11   | 1.7098     | 0.7549         | 1.7234   | 0.6765       |

### βœ… Best Validation Accuracy: **67.65%**
- Achieved at **Epoch 11**
- Model checkpoint saved automatically

---

## 🧠 Dataset

**HMDB51** is a widely used human action recognition dataset:
- 51 action classes
- Real-world videos
- Diverse motion patterns and scenes

---

## πŸš€ Usage

### Install Dependencies

```bash
pip install torch torchvision transformers