sumit7488
/

TimesFormerRefineOnHMDB51

Video Classification

action-recognition

Model card Files Files and versions

TimesFormerRefineOnHMDB51 / README.md

sumit7488's picture

Update README.md

a5ecf61 verified 16 days ago

|

history blame contribute delete

2.18 kB

	---
	license: mit
	tags:
	- video-classification
	- timesformer
	- action-recognition
	- hmdb51
	- pytorch
	datasets:
	- hmdb51
	---

	# 🎬 TimeSformer Fine-Tuned on HMDB51

	This model is a fine-tuned version of TimeSformer (Time-Space Transformer) for video action recognition, trained on the HMDB51 dataset.

	---

	## 📌 Model Overview

	- Architecture: TimeSformer (Transformer-based video model)
	- Task: Video Classification / Action Recognition
	- Dataset: HMDB51 (51 human action classes)
	- Framework: PyTorch + Hugging Face Transformers
	- Training Device: CUDA (GPU)

	This model learns spatial and temporal features from video frames using transformer-based attention mechanisms.

	---

	## 📊 Training Details

	- Epochs: 15
	- Training Strategy:
	- Mixed Precision Training (`torch.cuda.amp`)
	- Early stopping based on validation performance
	- Best model checkpoint saved during training

	---

	## 📈 Training Results

	\| Epoch \| Train Loss \| Train Accuracy \| Val Loss \| Val Accuracy \|
	\|------\|------------\|----------------\|----------\|--------------\|
	\| 1 \| 3.9314 \| 0.0323 \| 3.7800 \| 0.1248 \|
	\| 2 \| 3.5892 \| 0.1976 \| 3.2530 \| 0.4353 \|
	\| 3 \| 3.0235 \| 0.4354 \| 2.6567 \| 0.5340 \|
	\| 4 \| 2.5720 \| 0.5246 \| 2.2347 \| 0.6033 \|
	\| 5 \| 2.2836 \| 0.6009 \| 2.0135 \| 0.6327 \|
	\| 6 \| 2.0897 \| 0.6419 \| 1.8659 \| 0.6536 \|
	\| 7 \| 1.9678 \| 0.6807 \| 1.8017 \| 0.6667 \|
	\| 8 \| 1.8847 \| 0.7030 \| 1.7705 \| 0.6627 \|
	\| 9 \| 1.8213 \| 0.7237 \| 1.7355 \| 0.6627 \|
	\| 10 \| 1.7576 \| 0.7414 \| 1.7340 \| 0.6680 \|
	\| 11 \| 1.7098 \| 0.7549 \| 1.7234 \| 0.6765 \|

	### ✅ Best Validation Accuracy: 67.65%
	- Achieved at Epoch 11
	- Model checkpoint saved automatically

	---

	## 🧠 Dataset

	HMDB51 is a widely used human action recognition dataset:
	- 51 action classes
	- Real-world videos
	- Diverse motion patterns and scenes

	---

	## 🚀 Usage

	### Install Dependencies

	```bash
	pip install torch torchvision transformers