--- title: GenVidBench - Video Action Recognition emoji: 🎬 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.47.2 app_file: app.py pinned: false license: apache-2.0 short_description: State-of-the-art video action recognition using MMAction2 --- # GenVidBench - Video Action Recognition A powerful video analysis tool that uses state-of-the-art deep learning models to recognize actions and activities in videos. Built on top of MMAction2 framework with a user-friendly Gradio interface. ## 🚀 Features - **Action Recognition**: Identify actions and activities in videos using TSN (Temporal Segment Networks) - **Top-5 Predictions**: Get the most likely actions with confidence scores - **Multiple Formats**: Support for MP4, AVI, MOV, and other video formats - **Real-time Processing**: Fast inference optimized for web deployment - **User-friendly Interface**: Clean and intuitive Gradio web interface ## 🎯 Model Details This demo uses: - **Model**: TSN (Temporal Segment Networks) with ResNet-50 backbone - **Dataset**: Trained on Kinetics-400 dataset (400 action classes) - **Framework**: MMAction2 (OpenMMLab) - **Input**: RGB video frames - **Output**: Top-5 action predictions with confidence scores ## 🛠️ Technical Stack - **Backend**: Python, PyTorch, MMAction2 - **Frontend**: Gradio - **Video Processing**: OpenCV, Decord - **Deployment**: Hugging Face Spaces ## 📖 How to Use 1. **Upload Video**: Click the upload area or drag and drop your video file 2. **Wait for Processing**: The model will analyze your video (usually takes a few seconds) 3. **View Results**: See the top 5 predicted actions with confidence scores ## 💡 Tips for Best Results - **Video Length**: Shorter videos (under 30 seconds) process faster - **Video Quality**: Clear, well-lit videos work best - **Action Clarity**: Videos with clear, distinct actions yield better results - **Supported Formats**: MP4, AVI, MOV, and other common video formats ## 🔬 Supported Actions The model can recognize 400 different action classes from the Kinetics-400 dataset, including: - Sports activities (basketball, soccer, tennis, etc.) - Daily activities (cooking, cleaning, reading, etc.) - Physical exercises (push-ups, jumping jacks, etc.) - Musical activities (playing instruments, singing, etc.) - And many more! ## 🏗️ Architecture ``` Video Input → Frame Sampling → Feature Extraction → Classification → Top-5 Predictions ``` ## 📊 Performance - **Accuracy**: State-of-the-art performance on Kinetics-400 - **Speed**: Optimized for real-time inference - **Memory**: Efficient GPU/CPU utilization ## 🤝 Contributing This project is part of the GenVidBench framework. Contributions are welcome! ## 📄 License This project is licensed under the Apache License 2.0 - see the LICENSE file for details. ## 🙏 Acknowledgments - [MMAction2](https://github.com/open-mmlab/mmaction2) - The underlying framework - [OpenMMLab](https://openmmlab.com/) - For the excellent computer vision tools - [Hugging Face](https://huggingface.co/) - For the deployment platform --- **Note**: This is a demonstration of video action recognition capabilities. For production use, consider additional validation and error handling.