File size: 3,255 Bytes
0ddf827
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d670799
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
title: GenVidBench - Video Action Recognition
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.47.2
app_file: app.py
pinned: false
license: apache-2.0
short_description: State-of-the-art video action recognition using MMAction2
---

# GenVidBench - Video Action Recognition

A powerful video analysis tool that uses state-of-the-art deep learning models to recognize actions and activities in videos. Built on top of MMAction2 framework with a user-friendly Gradio interface.

## πŸš€ Features

- **Action Recognition**: Identify actions and activities in videos using TSN (Temporal Segment Networks)
- **Top-5 Predictions**: Get the most likely actions with confidence scores
- **Multiple Formats**: Support for MP4, AVI, MOV, and other video formats
- **Real-time Processing**: Fast inference optimized for web deployment
- **User-friendly Interface**: Clean and intuitive Gradio web interface

## 🎯 Model Details

This demo uses:
- **Model**: TSN (Temporal Segment Networks) with ResNet-50 backbone
- **Dataset**: Trained on Kinetics-400 dataset (400 action classes)
- **Framework**: MMAction2 (OpenMMLab)
- **Input**: RGB video frames
- **Output**: Top-5 action predictions with confidence scores

## πŸ› οΈ Technical Stack

- **Backend**: Python, PyTorch, MMAction2
- **Frontend**: Gradio
- **Video Processing**: OpenCV, Decord
- **Deployment**: Hugging Face Spaces

## πŸ“– How to Use

1. **Upload Video**: Click the upload area or drag and drop your video file
2. **Wait for Processing**: The model will analyze your video (usually takes a few seconds)
3. **View Results**: See the top 5 predicted actions with confidence scores

## πŸ’‘ Tips for Best Results

- **Video Length**: Shorter videos (under 30 seconds) process faster
- **Video Quality**: Clear, well-lit videos work best
- **Action Clarity**: Videos with clear, distinct actions yield better results
- **Supported Formats**: MP4, AVI, MOV, and other common video formats

## πŸ”¬ Supported Actions

The model can recognize 400 different action classes from the Kinetics-400 dataset, including:
- Sports activities (basketball, soccer, tennis, etc.)
- Daily activities (cooking, cleaning, reading, etc.)
- Physical exercises (push-ups, jumping jacks, etc.)
- Musical activities (playing instruments, singing, etc.)
- And many more!

## πŸ—οΈ Architecture

```
Video Input β†’ Frame Sampling β†’ Feature Extraction β†’ Classification β†’ Top-5 Predictions
```

## πŸ“Š Performance

- **Accuracy**: State-of-the-art performance on Kinetics-400
- **Speed**: Optimized for real-time inference
- **Memory**: Efficient GPU/CPU utilization

## 🀝 Contributing

This project is part of the GenVidBench framework. Contributions are welcome!

## πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

## πŸ™ Acknowledgments

- [MMAction2](https://github.com/open-mmlab/mmaction2) - The underlying framework
- [OpenMMLab](https://openmmlab.com/) - For the excellent computer vision tools
- [Hugging Face](https://huggingface.co/) - For the deployment platform

---

**Note**: This is a demonstration of video action recognition capabilities. For production use, consider additional validation and error handling.