A newer version of the Gradio SDK is available:
6.1.0
metadata
title: GenVidBench - Video Action Recognition
emoji: π¬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.47.2
app_file: app.py
pinned: false
license: apache-2.0
short_description: State-of-the-art video action recognition using MMAction2
GenVidBench - Video Action Recognition
A powerful video analysis tool that uses state-of-the-art deep learning models to recognize actions and activities in videos. Built on top of MMAction2 framework with a user-friendly Gradio interface.
π Features
- Action Recognition: Identify actions and activities in videos using TSN (Temporal Segment Networks)
- Top-5 Predictions: Get the most likely actions with confidence scores
- Multiple Formats: Support for MP4, AVI, MOV, and other video formats
- Real-time Processing: Fast inference optimized for web deployment
- User-friendly Interface: Clean and intuitive Gradio web interface
π― Model Details
This demo uses:
- Model: TSN (Temporal Segment Networks) with ResNet-50 backbone
- Dataset: Trained on Kinetics-400 dataset (400 action classes)
- Framework: MMAction2 (OpenMMLab)
- Input: RGB video frames
- Output: Top-5 action predictions with confidence scores
π οΈ Technical Stack
- Backend: Python, PyTorch, MMAction2
- Frontend: Gradio
- Video Processing: OpenCV, Decord
- Deployment: Hugging Face Spaces
π How to Use
- Upload Video: Click the upload area or drag and drop your video file
- Wait for Processing: The model will analyze your video (usually takes a few seconds)
- View Results: See the top 5 predicted actions with confidence scores
π‘ Tips for Best Results
- Video Length: Shorter videos (under 30 seconds) process faster
- Video Quality: Clear, well-lit videos work best
- Action Clarity: Videos with clear, distinct actions yield better results
- Supported Formats: MP4, AVI, MOV, and other common video formats
π¬ Supported Actions
The model can recognize 400 different action classes from the Kinetics-400 dataset, including:
- Sports activities (basketball, soccer, tennis, etc.)
- Daily activities (cooking, cleaning, reading, etc.)
- Physical exercises (push-ups, jumping jacks, etc.)
- Musical activities (playing instruments, singing, etc.)
- And many more!
ποΈ Architecture
Video Input β Frame Sampling β Feature Extraction β Classification β Top-5 Predictions
π Performance
- Accuracy: State-of-the-art performance on Kinetics-400
- Speed: Optimized for real-time inference
- Memory: Efficient GPU/CPU utilization
π€ Contributing
This project is part of the GenVidBench framework. Contributions are welcome!
π License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
π Acknowledgments
- MMAction2 - The underlying framework
- OpenMMLab - For the excellent computer vision tools
- Hugging Face - For the deployment platform
Note: This is a demonstration of video action recognition capabilities. For production use, consider additional validation and error handling.