VideoMaMa-Custom / README.md
RAM2118's picture
Upload folder using huggingface_hub
8d777b1 verified
---
title: VideoMaMa - Video Matting with Mask Guidance
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: apache-2.0
---
# 🎬 VideoMaMa: Video Matting with Mask Guidance
An interactive demo for high-quality video matting using sparse mask guidance. This demo combines SAM2 for automatic object tracking with our VideoMaMa model for generating alpha mattes.
## 🌟 Features
- **Single-Click Object Selection**: Simply click on the object you want to extract in the first frame
- **Automatic Tracking**: SAM2 automatically tracks your selected object through all frames
- **High-Quality Matting**: VideoMaMa generates smooth, temporally-consistent alpha mattes
- **Flexible Input**: Upload your own video or try our provided samples
- **Customizable**: Adjust augmentation settings for different scenarios
## πŸš€ How to Use
1. **Upload a video** or **select from samples**
2. **Click on the object** you want to extract in the first frame (displayed in the interface)
3. Optionally adjust **augmentation settings** in the advanced options
4. Click **"Generate Matting"** and wait for processing
5. View your results: output video, comparison images, and mask track
## πŸ”§ Installation (Local Setup)
If you want to run this demo locally:
```bash
# Install dependencies
pip install -r requirements.txt
# Add sample videos to samples/ directory (optional)
# Run the demo
python app.py
```
## 🎯 Tips for Best Results
- **Click Precisely**: Click on the center of the object you want to extract
- **Clear Objects**: Works best with distinct foreground objects
- **Video Length**: For faster processing, use shorter videos (< 5 seconds)
- **Augmentations**:
- Use "polygon" for cleaner geometric masks
- Enable temporal augmentation for challenging videos
- Try "bounding box" for very simple selections
## πŸ“š Technical Details
### Model Architecture
- **Base Model**: Stable Video Diffusion (SVD-XT)
- **Conditioning**: RGB frames + VAE-encoded masks
- **UNet**: Fine-tuned with additional mask conditioning channels
- **Processing**: Chunked inference (16 frames per chunk)
### SAM2 Integration
- Uses SAM2 video predictor for mask tracking
- Propagates mask from single click point through entire video
- Generates temporally consistent segmentation masks
## 🀝 Contributing
If you encounter issues or have suggestions:
1. Check that all model checkpoints are correctly placed
2. Ensure your GPU has sufficient VRAM
3. Try reducing video length or resolution for testing
## πŸ™ Acknowledgments
- **SAM2**: Meta AI's Segment Anything 2
- **Stable Video Diffusion**: Stability AI's video generation model
- **Gradio**: For the amazing UI framework
## πŸ“§ Contact
For questions or issues, please open an issue on our GitHub repository.
---
**Note**: This demo is for research purposes. Processing times may vary based on video length and available compute resources.