VideoMaMa-Custom / README.md
RAM2118's picture
Upload folder using huggingface_hub
8d777b1 verified

A newer version of the Gradio SDK is available: 6.14.0

Upgrade
metadata
title: VideoMaMa - Video Matting with Mask Guidance
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: apache-2.0

🎬 VideoMaMa: Video Matting with Mask Guidance

An interactive demo for high-quality video matting using sparse mask guidance. This demo combines SAM2 for automatic object tracking with our VideoMaMa model for generating alpha mattes.

🌟 Features

  • Single-Click Object Selection: Simply click on the object you want to extract in the first frame
  • Automatic Tracking: SAM2 automatically tracks your selected object through all frames
  • High-Quality Matting: VideoMaMa generates smooth, temporally-consistent alpha mattes
  • Flexible Input: Upload your own video or try our provided samples
  • Customizable: Adjust augmentation settings for different scenarios

πŸš€ How to Use

  1. Upload a video or select from samples
  2. Click on the object you want to extract in the first frame (displayed in the interface)
  3. Optionally adjust augmentation settings in the advanced options
  4. Click "Generate Matting" and wait for processing
  5. View your results: output video, comparison images, and mask track

πŸ”§ Installation (Local Setup)

If you want to run this demo locally:

# Install dependencies
pip install -r requirements.txt

# Add sample videos to samples/ directory (optional)

# Run the demo
python app.py

🎯 Tips for Best Results

  • Click Precisely: Click on the center of the object you want to extract
  • Clear Objects: Works best with distinct foreground objects
  • Video Length: For faster processing, use shorter videos (< 5 seconds)
  • Augmentations:
    • Use "polygon" for cleaner geometric masks
    • Enable temporal augmentation for challenging videos
    • Try "bounding box" for very simple selections

πŸ“š Technical Details

Model Architecture

  • Base Model: Stable Video Diffusion (SVD-XT)
  • Conditioning: RGB frames + VAE-encoded masks
  • UNet: Fine-tuned with additional mask conditioning channels
  • Processing: Chunked inference (16 frames per chunk)

SAM2 Integration

  • Uses SAM2 video predictor for mask tracking
  • Propagates mask from single click point through entire video
  • Generates temporally consistent segmentation masks

🀝 Contributing

If you encounter issues or have suggestions:

  1. Check that all model checkpoints are correctly placed
  2. Ensure your GPU has sufficient VRAM
  3. Try reducing video length or resolution for testing

πŸ™ Acknowledgments

  • SAM2: Meta AI's Segment Anything 2
  • Stable Video Diffusion: Stability AI's video generation model
  • Gradio: For the amazing UI framework

πŸ“§ Contact

For questions or issues, please open an issue on our GitHub repository.


Note: This demo is for research purposes. Processing times may vary based on video length and available compute resources.