sumit7488
/

TimesFormer_Baseline

Video Classification

action-recognition

Model card Files Files and versions

TimesFormer_Baseline / README.md

sumit7488's picture

Create README.md

603747e verified 14 days ago

|

history blame contribute delete

2.32 kB

	---
	license: mit
	tags:
	- video-classification
	- timesformer
	- action-recognition
	- ucf101
	- pytorch
	datasets:
	- ucf101
	---

	# 🎬 TimeSformer Fine-Tuned for Video Action Recognition

	This model is a fine-tuned version of TimeSformer (Time-Space Transformer) for video action recognition, trained on the UCF101 dataset.

	---

	## 📌 Model Overview

	- Base Model: facebook/timesformer-base-finetuned-k400
	- Task: Video Classification / Action Recognition
	- Dataset: UCF101 (101 action classes)
	- Framework: PyTorch + Hugging Face Transformers
	- Training Environment: Kaggle (GPU)

	---

	## 🧠 Training Strategy

	Due to Kaggle’s 12-hour session limit, training was performed in multiple stages:

	1. Initial training run
	2. Checkpoint saving (best model)
	3. Resume training from best checkpoint
	4. Further fine-tuning across sessions

	This approach ensures efficient long training without losing progress.

	---

	## 📊 Training Results

	### 🔹 Initial Training

	\| Epoch \| Train Loss \| Train Acc \| Val Loss \| Val Acc \|
	\|------\|------------\|-----------\|----------\|---------\|
	\| 1 \| 4.5066 \| 0.0622 \| 4.1089 \| 0.4245 \|
	\| 2 \| 3.5721 \| 0.4711 \| 2.5276 \| 0.8007 \|
	\| 3 \| 2.3239 \| 0.7323 \| 1.4321 \| 0.8993 \|

	---

	### 🔹 Continued Training (Checkpoint Resume)

	\| Epoch \| Train Loss \| Train Acc \| Val Loss \| Val Acc \|
	\|------\|------------\|-----------\|----------\|---------\|
	\| 4 \| 1.8289 \| 0.7991 \| 1.1802 \| 0.9199 \|
	\| 5 \| 1.7119 \| 0.8094 \| 1.1372 \| 0.9128 \|
	\| 6 \| 1.6365 \| 0.8153 \| 1.1085 \| 0.9191 \|
	\| 7 \| 1.5982 \| 0.8139 \| 1.0868 \| 0.9218 \|
	\| 8 \| 1.5053 \| 0.8194 \| 1.0763 \| 0.9262 \|
	\| 9 \| 1.4673 \| 0.8201 \| 1.0824 \| 0.9225 \|

	---

	## 🏆 Best Performance

	- Best Validation Accuracy: 92.62%
	- F1 Score: 0.9244
	- Precision: 0.9315
	- Recall: 0.9262
	- Achieved at Epoch 8

	---

	## 📈 Additional Metrics

	\| Metric \| Value \|
	\|-------\|------\|
	\| Precision \| 0.9315 \|
	\| Recall \| 0.9262 \|
	\| F1 Score \| 0.9244 \|

	---

	## ⚙️ Training Details

	- Mixed Precision Training (`torch.cuda.amp`)
	- GPU Memory Usage: ~9.3–9.8 GB
	- Training Time per Epoch: ~2.5 hours
	- Evaluation Time per Epoch: ~20 minutes
	- Best model checkpoint saved automatically

	---

	## 🚀 Usage

	### Install Dependencies

	```bash
	pip install torch torchvision transformers