Spaces:
Build error
Build error
| title: GenVidBench - Video Action Recognition | |
| emoji: π¬ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.47.2 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| short_description: State-of-the-art video action recognition using MMAction2 | |
| # GenVidBench - Video Action Recognition | |
| A powerful video analysis tool that uses state-of-the-art deep learning models to recognize actions and activities in videos. Built on top of MMAction2 framework with a user-friendly Gradio interface. | |
| ## π Features | |
| - **Action Recognition**: Identify actions and activities in videos using TSN (Temporal Segment Networks) | |
| - **Top-5 Predictions**: Get the most likely actions with confidence scores | |
| - **Multiple Formats**: Support for MP4, AVI, MOV, and other video formats | |
| - **Real-time Processing**: Fast inference optimized for web deployment | |
| - **User-friendly Interface**: Clean and intuitive Gradio web interface | |
| ## π― Model Details | |
| This demo uses: | |
| - **Model**: TSN (Temporal Segment Networks) with ResNet-50 backbone | |
| - **Dataset**: Trained on Kinetics-400 dataset (400 action classes) | |
| - **Framework**: MMAction2 (OpenMMLab) | |
| - **Input**: RGB video frames | |
| - **Output**: Top-5 action predictions with confidence scores | |
| ## π οΈ Technical Stack | |
| - **Backend**: Python, PyTorch, MMAction2 | |
| - **Frontend**: Gradio | |
| - **Video Processing**: OpenCV, Decord | |
| - **Deployment**: Hugging Face Spaces | |
| ## π How to Use | |
| 1. **Upload Video**: Click the upload area or drag and drop your video file | |
| 2. **Wait for Processing**: The model will analyze your video (usually takes a few seconds) | |
| 3. **View Results**: See the top 5 predicted actions with confidence scores | |
| ## π‘ Tips for Best Results | |
| - **Video Length**: Shorter videos (under 30 seconds) process faster | |
| - **Video Quality**: Clear, well-lit videos work best | |
| - **Action Clarity**: Videos with clear, distinct actions yield better results | |
| - **Supported Formats**: MP4, AVI, MOV, and other common video formats | |
| ## π¬ Supported Actions | |
| The model can recognize 400 different action classes from the Kinetics-400 dataset, including: | |
| - Sports activities (basketball, soccer, tennis, etc.) | |
| - Daily activities (cooking, cleaning, reading, etc.) | |
| - Physical exercises (push-ups, jumping jacks, etc.) | |
| - Musical activities (playing instruments, singing, etc.) | |
| - And many more! | |
| ## ποΈ Architecture | |
| ``` | |
| Video Input β Frame Sampling β Feature Extraction β Classification β Top-5 Predictions | |
| ``` | |
| ## π Performance | |
| - **Accuracy**: State-of-the-art performance on Kinetics-400 | |
| - **Speed**: Optimized for real-time inference | |
| - **Memory**: Efficient GPU/CPU utilization | |
| ## π€ Contributing | |
| This project is part of the GenVidBench framework. Contributions are welcome! | |
| ## π License | |
| This project is licensed under the Apache License 2.0 - see the LICENSE file for details. | |
| ## π Acknowledgments | |
| - [MMAction2](https://github.com/open-mmlab/mmaction2) - The underlying framework | |
| - [OpenMMLab](https://openmmlab.com/) - For the excellent computer vision tools | |
| - [Hugging Face](https://huggingface.co/) - For the deployment platform | |
| --- | |
| **Note**: This is a demonstration of video action recognition capabilities. For production use, consider additional validation and error handling. |