Spaces:
Build error
Build error
| title: VideoMaMa - Video Matting with Mask Guidance | |
| emoji: π¬ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 4.0.0 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| # π¬ VideoMaMa: Video Matting with Mask Guidance | |
| An interactive demo for high-quality video matting using sparse mask guidance. This demo combines SAM2 for automatic object tracking with our VideoMaMa model for generating alpha mattes. | |
| ## π Features | |
| - **Single-Click Object Selection**: Simply click on the object you want to extract in the first frame | |
| - **Automatic Tracking**: SAM2 automatically tracks your selected object through all frames | |
| - **High-Quality Matting**: VideoMaMa generates smooth, temporally-consistent alpha mattes | |
| - **Flexible Input**: Upload your own video or try our provided samples | |
| - **Customizable**: Adjust augmentation settings for different scenarios | |
| ## π How to Use | |
| 1. **Upload a video** or **select from samples** | |
| 2. **Click on the object** you want to extract in the first frame (displayed in the interface) | |
| 3. Optionally adjust **augmentation settings** in the advanced options | |
| 4. Click **"Generate Matting"** and wait for processing | |
| 5. View your results: output video, comparison images, and mask track | |
| ## π§ Installation (Local Setup) | |
| If you want to run this demo locally: | |
| ```bash | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Add sample videos to samples/ directory (optional) | |
| # Run the demo | |
| python app.py | |
| ``` | |
| ## π― Tips for Best Results | |
| - **Click Precisely**: Click on the center of the object you want to extract | |
| - **Clear Objects**: Works best with distinct foreground objects | |
| - **Video Length**: For faster processing, use shorter videos (< 5 seconds) | |
| - **Augmentations**: | |
| - Use "polygon" for cleaner geometric masks | |
| - Enable temporal augmentation for challenging videos | |
| - Try "bounding box" for very simple selections | |
| ## π Technical Details | |
| ### Model Architecture | |
| - **Base Model**: Stable Video Diffusion (SVD-XT) | |
| - **Conditioning**: RGB frames + VAE-encoded masks | |
| - **UNet**: Fine-tuned with additional mask conditioning channels | |
| - **Processing**: Chunked inference (16 frames per chunk) | |
| ### SAM2 Integration | |
| - Uses SAM2 video predictor for mask tracking | |
| - Propagates mask from single click point through entire video | |
| - Generates temporally consistent segmentation masks | |
| ## π€ Contributing | |
| If you encounter issues or have suggestions: | |
| 1. Check that all model checkpoints are correctly placed | |
| 2. Ensure your GPU has sufficient VRAM | |
| 3. Try reducing video length or resolution for testing | |
| ## π Acknowledgments | |
| - **SAM2**: Meta AI's Segment Anything 2 | |
| - **Stable Video Diffusion**: Stability AI's video generation model | |
| - **Gradio**: For the amazing UI framework | |
| ## π§ Contact | |
| For questions or issues, please open an issue on our GitHub repository. | |
| --- | |
| **Note**: This demo is for research purposes. Processing times may vary based on video length and available compute resources. | |