Spaces:
Build error
Build error
A newer version of the Gradio SDK is available: 6.14.0
metadata
title: VideoMaMa - Video Matting with Mask Guidance
emoji: π¬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: apache-2.0
π¬ VideoMaMa: Video Matting with Mask Guidance
An interactive demo for high-quality video matting using sparse mask guidance. This demo combines SAM2 for automatic object tracking with our VideoMaMa model for generating alpha mattes.
π Features
- Single-Click Object Selection: Simply click on the object you want to extract in the first frame
- Automatic Tracking: SAM2 automatically tracks your selected object through all frames
- High-Quality Matting: VideoMaMa generates smooth, temporally-consistent alpha mattes
- Flexible Input: Upload your own video or try our provided samples
- Customizable: Adjust augmentation settings for different scenarios
π How to Use
- Upload a video or select from samples
- Click on the object you want to extract in the first frame (displayed in the interface)
- Optionally adjust augmentation settings in the advanced options
- Click "Generate Matting" and wait for processing
- View your results: output video, comparison images, and mask track
π§ Installation (Local Setup)
If you want to run this demo locally:
# Install dependencies
pip install -r requirements.txt
# Add sample videos to samples/ directory (optional)
# Run the demo
python app.py
π― Tips for Best Results
- Click Precisely: Click on the center of the object you want to extract
- Clear Objects: Works best with distinct foreground objects
- Video Length: For faster processing, use shorter videos (< 5 seconds)
- Augmentations:
- Use "polygon" for cleaner geometric masks
- Enable temporal augmentation for challenging videos
- Try "bounding box" for very simple selections
π Technical Details
Model Architecture
- Base Model: Stable Video Diffusion (SVD-XT)
- Conditioning: RGB frames + VAE-encoded masks
- UNet: Fine-tuned with additional mask conditioning channels
- Processing: Chunked inference (16 frames per chunk)
SAM2 Integration
- Uses SAM2 video predictor for mask tracking
- Propagates mask from single click point through entire video
- Generates temporally consistent segmentation masks
π€ Contributing
If you encounter issues or have suggestions:
- Check that all model checkpoints are correctly placed
- Ensure your GPU has sufficient VRAM
- Try reducing video length or resolution for testing
π Acknowledgments
- SAM2: Meta AI's Segment Anything 2
- Stable Video Diffusion: Stability AI's video generation model
- Gradio: For the amazing UI framework
π§ Contact
For questions or issues, please open an issue on our GitHub repository.
Note: This demo is for research purposes. Processing times may vary based on video length and available compute resources.