Update README.md
Browse files
README.md
CHANGED
|
@@ -1,92 +1,92 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: GenVidBench - Video Action Recognition
|
| 3 |
-
emoji: π¬
|
| 4 |
-
colorFrom: blue
|
| 5 |
-
colorTo: purple
|
| 6 |
-
sdk: gradio
|
| 7 |
-
sdk_version:
|
| 8 |
-
app_file: app.py
|
| 9 |
-
pinned: false
|
| 10 |
-
license: apache-2.0
|
| 11 |
-
short_description: State-of-the-art video action recognition using MMAction2
|
| 12 |
-
---
|
| 13 |
-
|
| 14 |
-
# GenVidBench - Video Action Recognition
|
| 15 |
-
|
| 16 |
-
A powerful video analysis tool that uses state-of-the-art deep learning models to recognize actions and activities in videos. Built on top of MMAction2 framework with a user-friendly Gradio interface.
|
| 17 |
-
|
| 18 |
-
## π Features
|
| 19 |
-
|
| 20 |
-
- **Action Recognition**: Identify actions and activities in videos using TSN (Temporal Segment Networks)
|
| 21 |
-
- **Top-5 Predictions**: Get the most likely actions with confidence scores
|
| 22 |
-
- **Multiple Formats**: Support for MP4, AVI, MOV, and other video formats
|
| 23 |
-
- **Real-time Processing**: Fast inference optimized for web deployment
|
| 24 |
-
- **User-friendly Interface**: Clean and intuitive Gradio web interface
|
| 25 |
-
|
| 26 |
-
## π― Model Details
|
| 27 |
-
|
| 28 |
-
This demo uses:
|
| 29 |
-
- **Model**: TSN (Temporal Segment Networks) with ResNet-50 backbone
|
| 30 |
-
- **Dataset**: Trained on Kinetics-400 dataset (400 action classes)
|
| 31 |
-
- **Framework**: MMAction2 (OpenMMLab)
|
| 32 |
-
- **Input**: RGB video frames
|
| 33 |
-
- **Output**: Top-5 action predictions with confidence scores
|
| 34 |
-
|
| 35 |
-
## π οΈ Technical Stack
|
| 36 |
-
|
| 37 |
-
- **Backend**: Python, PyTorch, MMAction2
|
| 38 |
-
- **Frontend**: Gradio
|
| 39 |
-
- **Video Processing**: OpenCV, Decord
|
| 40 |
-
- **Deployment**: Hugging Face Spaces
|
| 41 |
-
|
| 42 |
-
## π How to Use
|
| 43 |
-
|
| 44 |
-
1. **Upload Video**: Click the upload area or drag and drop your video file
|
| 45 |
-
2. **Wait for Processing**: The model will analyze your video (usually takes a few seconds)
|
| 46 |
-
3. **View Results**: See the top 5 predicted actions with confidence scores
|
| 47 |
-
|
| 48 |
-
## π‘ Tips for Best Results
|
| 49 |
-
|
| 50 |
-
- **Video Length**: Shorter videos (under 30 seconds) process faster
|
| 51 |
-
- **Video Quality**: Clear, well-lit videos work best
|
| 52 |
-
- **Action Clarity**: Videos with clear, distinct actions yield better results
|
| 53 |
-
- **Supported Formats**: MP4, AVI, MOV, and other common video formats
|
| 54 |
-
|
| 55 |
-
## π¬ Supported Actions
|
| 56 |
-
|
| 57 |
-
The model can recognize 400 different action classes from the Kinetics-400 dataset, including:
|
| 58 |
-
- Sports activities (basketball, soccer, tennis, etc.)
|
| 59 |
-
- Daily activities (cooking, cleaning, reading, etc.)
|
| 60 |
-
- Physical exercises (push-ups, jumping jacks, etc.)
|
| 61 |
-
- Musical activities (playing instruments, singing, etc.)
|
| 62 |
-
- And many more!
|
| 63 |
-
|
| 64 |
-
## ποΈ Architecture
|
| 65 |
-
|
| 66 |
-
```
|
| 67 |
-
Video Input β Frame Sampling β Feature Extraction β Classification β Top-5 Predictions
|
| 68 |
-
```
|
| 69 |
-
|
| 70 |
-
## π Performance
|
| 71 |
-
|
| 72 |
-
- **Accuracy**: State-of-the-art performance on Kinetics-400
|
| 73 |
-
- **Speed**: Optimized for real-time inference
|
| 74 |
-
- **Memory**: Efficient GPU/CPU utilization
|
| 75 |
-
|
| 76 |
-
## π€ Contributing
|
| 77 |
-
|
| 78 |
-
This project is part of the GenVidBench framework. Contributions are welcome!
|
| 79 |
-
|
| 80 |
-
## π License
|
| 81 |
-
|
| 82 |
-
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
|
| 83 |
-
|
| 84 |
-
## π Acknowledgments
|
| 85 |
-
|
| 86 |
-
- [MMAction2](https://github.com/open-mmlab/mmaction2) - The underlying framework
|
| 87 |
-
- [OpenMMLab](https://openmmlab.com/) - For the excellent computer vision tools
|
| 88 |
-
- [Hugging Face](https://huggingface.co/) - For the deployment platform
|
| 89 |
-
|
| 90 |
-
---
|
| 91 |
-
|
| 92 |
**Note**: This is a demonstration of video action recognition capabilities. For production use, consider additional validation and error handling.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: GenVidBench - Video Action Recognition
|
| 3 |
+
emoji: π¬
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 5.47.2
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
license: apache-2.0
|
| 11 |
+
short_description: State-of-the-art video action recognition using MMAction2
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# GenVidBench - Video Action Recognition
|
| 15 |
+
|
| 16 |
+
A powerful video analysis tool that uses state-of-the-art deep learning models to recognize actions and activities in videos. Built on top of MMAction2 framework with a user-friendly Gradio interface.
|
| 17 |
+
|
| 18 |
+
## π Features
|
| 19 |
+
|
| 20 |
+
- **Action Recognition**: Identify actions and activities in videos using TSN (Temporal Segment Networks)
|
| 21 |
+
- **Top-5 Predictions**: Get the most likely actions with confidence scores
|
| 22 |
+
- **Multiple Formats**: Support for MP4, AVI, MOV, and other video formats
|
| 23 |
+
- **Real-time Processing**: Fast inference optimized for web deployment
|
| 24 |
+
- **User-friendly Interface**: Clean and intuitive Gradio web interface
|
| 25 |
+
|
| 26 |
+
## π― Model Details
|
| 27 |
+
|
| 28 |
+
This demo uses:
|
| 29 |
+
- **Model**: TSN (Temporal Segment Networks) with ResNet-50 backbone
|
| 30 |
+
- **Dataset**: Trained on Kinetics-400 dataset (400 action classes)
|
| 31 |
+
- **Framework**: MMAction2 (OpenMMLab)
|
| 32 |
+
- **Input**: RGB video frames
|
| 33 |
+
- **Output**: Top-5 action predictions with confidence scores
|
| 34 |
+
|
| 35 |
+
## π οΈ Technical Stack
|
| 36 |
+
|
| 37 |
+
- **Backend**: Python, PyTorch, MMAction2
|
| 38 |
+
- **Frontend**: Gradio
|
| 39 |
+
- **Video Processing**: OpenCV, Decord
|
| 40 |
+
- **Deployment**: Hugging Face Spaces
|
| 41 |
+
|
| 42 |
+
## π How to Use
|
| 43 |
+
|
| 44 |
+
1. **Upload Video**: Click the upload area or drag and drop your video file
|
| 45 |
+
2. **Wait for Processing**: The model will analyze your video (usually takes a few seconds)
|
| 46 |
+
3. **View Results**: See the top 5 predicted actions with confidence scores
|
| 47 |
+
|
| 48 |
+
## π‘ Tips for Best Results
|
| 49 |
+
|
| 50 |
+
- **Video Length**: Shorter videos (under 30 seconds) process faster
|
| 51 |
+
- **Video Quality**: Clear, well-lit videos work best
|
| 52 |
+
- **Action Clarity**: Videos with clear, distinct actions yield better results
|
| 53 |
+
- **Supported Formats**: MP4, AVI, MOV, and other common video formats
|
| 54 |
+
|
| 55 |
+
## π¬ Supported Actions
|
| 56 |
+
|
| 57 |
+
The model can recognize 400 different action classes from the Kinetics-400 dataset, including:
|
| 58 |
+
- Sports activities (basketball, soccer, tennis, etc.)
|
| 59 |
+
- Daily activities (cooking, cleaning, reading, etc.)
|
| 60 |
+
- Physical exercises (push-ups, jumping jacks, etc.)
|
| 61 |
+
- Musical activities (playing instruments, singing, etc.)
|
| 62 |
+
- And many more!
|
| 63 |
+
|
| 64 |
+
## ποΈ Architecture
|
| 65 |
+
|
| 66 |
+
```
|
| 67 |
+
Video Input β Frame Sampling β Feature Extraction β Classification β Top-5 Predictions
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
## π Performance
|
| 71 |
+
|
| 72 |
+
- **Accuracy**: State-of-the-art performance on Kinetics-400
|
| 73 |
+
- **Speed**: Optimized for real-time inference
|
| 74 |
+
- **Memory**: Efficient GPU/CPU utilization
|
| 75 |
+
|
| 76 |
+
## π€ Contributing
|
| 77 |
+
|
| 78 |
+
This project is part of the GenVidBench framework. Contributions are welcome!
|
| 79 |
+
|
| 80 |
+
## π License
|
| 81 |
+
|
| 82 |
+
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
|
| 83 |
+
|
| 84 |
+
## π Acknowledgments
|
| 85 |
+
|
| 86 |
+
- [MMAction2](https://github.com/open-mmlab/mmaction2) - The underlying framework
|
| 87 |
+
- [OpenMMLab](https://openmmlab.com/) - For the excellent computer vision tools
|
| 88 |
+
- [Hugging Face](https://huggingface.co/) - For the deployment platform
|
| 89 |
+
|
| 90 |
+
---
|
| 91 |
+
|
| 92 |
**Note**: This is a demonstration of video action recognition capabilities. For production use, consider additional validation and error handling.
|