Spaces:

RAM2118
/

VideoMaMa-Custom

Build error

App Files Files Community

VideoMaMa-Custom / README.md

RAM2118

Upload folder using huggingface_hub

8d777b1 verified 3 months ago

preview code

raw

history blame contribute delete

2.97 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

metadata

title: VideoMaMa - Video Matting with Mask Guidance
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: apache-2.0

🎬 VideoMaMa: Video Matting with Mask Guidance

An interactive demo for high-quality video matting using sparse mask guidance. This demo combines SAM2 for automatic object tracking with our VideoMaMa model for generating alpha mattes.

🌟 Features

Single-Click Object Selection: Simply click on the object you want to extract in the first frame
Automatic Tracking: SAM2 automatically tracks your selected object through all frames
High-Quality Matting: VideoMaMa generates smooth, temporally-consistent alpha mattes
Flexible Input: Upload your own video or try our provided samples
Customizable: Adjust augmentation settings for different scenarios

🚀 How to Use

Upload a video or select from samples
Click on the object you want to extract in the first frame (displayed in the interface)
Optionally adjust augmentation settings in the advanced options
Click "Generate Matting" and wait for processing
View your results: output video, comparison images, and mask track

🔧 Installation (Local Setup)

If you want to run this demo locally:

# Install dependencies
pip install -r requirements.txt

# Add sample videos to samples/ directory (optional)

# Run the demo
python app.py

🎯 Tips for Best Results

Click Precisely: Click on the center of the object you want to extract
Clear Objects: Works best with distinct foreground objects
Video Length: For faster processing, use shorter videos (< 5 seconds)
Augmentations:
- Use "polygon" for cleaner geometric masks
- Enable temporal augmentation for challenging videos
- Try "bounding box" for very simple selections

📚 Technical Details

Model Architecture

Base Model: Stable Video Diffusion (SVD-XT)
Conditioning: RGB frames + VAE-encoded masks
UNet: Fine-tuned with additional mask conditioning channels
Processing: Chunked inference (16 frames per chunk)

SAM2 Integration

Uses SAM2 video predictor for mask tracking
Propagates mask from single click point through entire video
Generates temporally consistent segmentation masks

🤝 Contributing

If you encounter issues or have suggestions:

Check that all model checkpoints are correctly placed
Ensure your GPU has sufficient VRAM
Try reducing video length or resolution for testing

🙏 Acknowledgments

SAM2: Meta AI's Segment Anything 2
Stable Video Diffusion: Stability AI's video generation model
Gradio: For the amazing UI framework

📧 Contact

For questions or issues, please open an issue on our GitHub repository.

Note: This demo is for research purposes. Processing times may vary based on video length and available compute resources.