|
|
--- |
|
|
title: GenVidBench - Video Action Recognition |
|
|
emoji: π¬ |
|
|
colorFrom: blue |
|
|
colorTo: purple |
|
|
sdk: gradio |
|
|
sdk_version: 5.47.2 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: apache-2.0 |
|
|
short_description: State-of-the-art video action recognition using MMAction2 |
|
|
--- |
|
|
|
|
|
# GenVidBench - Video Action Recognition |
|
|
|
|
|
A powerful video analysis tool that uses state-of-the-art deep learning models to recognize actions and activities in videos. Built on top of MMAction2 framework with a user-friendly Gradio interface. |
|
|
|
|
|
## π Features |
|
|
|
|
|
- **Action Recognition**: Identify actions and activities in videos using TSN (Temporal Segment Networks) |
|
|
- **Top-5 Predictions**: Get the most likely actions with confidence scores |
|
|
- **Multiple Formats**: Support for MP4, AVI, MOV, and other video formats |
|
|
- **Real-time Processing**: Fast inference optimized for web deployment |
|
|
- **User-friendly Interface**: Clean and intuitive Gradio web interface |
|
|
|
|
|
## π― Model Details |
|
|
|
|
|
This demo uses: |
|
|
- **Model**: TSN (Temporal Segment Networks) with ResNet-50 backbone |
|
|
- **Dataset**: Trained on Kinetics-400 dataset (400 action classes) |
|
|
- **Framework**: MMAction2 (OpenMMLab) |
|
|
- **Input**: RGB video frames |
|
|
- **Output**: Top-5 action predictions with confidence scores |
|
|
|
|
|
## π οΈ Technical Stack |
|
|
|
|
|
- **Backend**: Python, PyTorch, MMAction2 |
|
|
- **Frontend**: Gradio |
|
|
- **Video Processing**: OpenCV, Decord |
|
|
- **Deployment**: Hugging Face Spaces |
|
|
|
|
|
## π How to Use |
|
|
|
|
|
1. **Upload Video**: Click the upload area or drag and drop your video file |
|
|
2. **Wait for Processing**: The model will analyze your video (usually takes a few seconds) |
|
|
3. **View Results**: See the top 5 predicted actions with confidence scores |
|
|
|
|
|
## π‘ Tips for Best Results |
|
|
|
|
|
- **Video Length**: Shorter videos (under 30 seconds) process faster |
|
|
- **Video Quality**: Clear, well-lit videos work best |
|
|
- **Action Clarity**: Videos with clear, distinct actions yield better results |
|
|
- **Supported Formats**: MP4, AVI, MOV, and other common video formats |
|
|
|
|
|
## π¬ Supported Actions |
|
|
|
|
|
The model can recognize 400 different action classes from the Kinetics-400 dataset, including: |
|
|
- Sports activities (basketball, soccer, tennis, etc.) |
|
|
- Daily activities (cooking, cleaning, reading, etc.) |
|
|
- Physical exercises (push-ups, jumping jacks, etc.) |
|
|
- Musical activities (playing instruments, singing, etc.) |
|
|
- And many more! |
|
|
|
|
|
## ποΈ Architecture |
|
|
|
|
|
``` |
|
|
Video Input β Frame Sampling β Feature Extraction β Classification β Top-5 Predictions |
|
|
``` |
|
|
|
|
|
## π Performance |
|
|
|
|
|
- **Accuracy**: State-of-the-art performance on Kinetics-400 |
|
|
- **Speed**: Optimized for real-time inference |
|
|
- **Memory**: Efficient GPU/CPU utilization |
|
|
|
|
|
## π€ Contributing |
|
|
|
|
|
This project is part of the GenVidBench framework. Contributions are welcome! |
|
|
|
|
|
## π License |
|
|
|
|
|
This project is licensed under the Apache License 2.0 - see the LICENSE file for details. |
|
|
|
|
|
## π Acknowledgments |
|
|
|
|
|
- [MMAction2](https://github.com/open-mmlab/mmaction2) - The underlying framework |
|
|
- [OpenMMLab](https://openmmlab.com/) - For the excellent computer vision tools |
|
|
- [Hugging Face](https://huggingface.co/) - For the deployment platform |
|
|
|
|
|
--- |
|
|
|
|
|
**Note**: This is a demonstration of video action recognition capabilities. For production use, consider additional validation and error handling. |