File size: 2,318 Bytes
603747e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
license: mit
tags:
- video-classification
- timesformer
- action-recognition
- ucf101
- pytorch
datasets:
- ucf101
---

# ๐ŸŽฌ TimeSformer Fine-Tuned for Video Action Recognition

This model is a fine-tuned version of **TimeSformer (Time-Space Transformer)** for **video action recognition**, trained on the **UCF101 dataset**.

---

## ๐Ÿ“Œ Model Overview

- **Base Model:** facebook/timesformer-base-finetuned-k400  
- **Task:** Video Classification / Action Recognition  
- **Dataset:** UCF101 (101 action classes)  
- **Framework:** PyTorch + Hugging Face Transformers  
- **Training Environment:** Kaggle (GPU)

---

## ๐Ÿง  Training Strategy

Due to Kaggleโ€™s **12-hour session limit**, training was performed in **multiple stages**:

1. Initial training run
2. Checkpoint saving (best model)
3. Resume training from best checkpoint
4. Further fine-tuning across sessions

This approach ensures efficient long training without losing progress.

---

## ๐Ÿ“Š Training Results

### ๐Ÿ”น Initial Training

| Epoch | Train Loss | Train Acc | Val Loss | Val Acc |
|------|------------|-----------|----------|---------|
| 1 | 4.5066 | 0.0622 | 4.1089 | 0.4245 |
| 2 | 3.5721 | 0.4711 | 2.5276 | 0.8007 |
| 3 | 2.3239 | 0.7323 | 1.4321 | 0.8993 |

---

### ๐Ÿ”น Continued Training (Checkpoint Resume)

| Epoch | Train Loss | Train Acc | Val Loss | Val Acc |
|------|------------|-----------|----------|---------|
| 4 | 1.8289 | 0.7991 | 1.1802 | 0.9199 |
| 5 | 1.7119 | 0.8094 | 1.1372 | 0.9128 |
| 6 | 1.6365 | 0.8153 | 1.1085 | 0.9191 |
| 7 | 1.5982 | 0.8139 | 1.0868 | 0.9218 |
| 8 | 1.5053 | 0.8194 | 1.0763 | **0.9262** |
| 9 | 1.4673 | 0.8201 | 1.0824 | 0.9225 |

---

## ๐Ÿ† Best Performance

- **Best Validation Accuracy:** **92.62%**
- **F1 Score:** 0.9244  
- **Precision:** 0.9315  
- **Recall:** 0.9262  
- Achieved at **Epoch 8**

---

## ๐Ÿ“ˆ Additional Metrics

| Metric | Value |
|-------|------|
| Precision | 0.9315 |
| Recall | 0.9262 |
| F1 Score | 0.9244 |

---

## โš™๏ธ Training Details

- Mixed Precision Training (`torch.cuda.amp`)
- GPU Memory Usage: ~9.3โ€“9.8 GB
- Training Time per Epoch: ~2.5 hours
- Evaluation Time per Epoch: ~20 minutes
- Best model checkpoint saved automatically

---

## ๐Ÿš€ Usage

### Install Dependencies

```bash
pip install torch torchvision transformers