VideoMaMa / README.md
nielsr's picture
nielsr HF Staff
Improve model card and add metadata
3e0448a verified
|
raw
history blame
2.27 kB
metadata
license: other
license_name: stabilityai-community-license
license_link: >-
  https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/blob/main/LICENSE.md
library_name: diffusers
pipeline_tag: image-to-image

VideoMaMa: Mask-Guided Video Matting via Generative Prior

Sangbeom LimSeoung Wug OhJiahui HuangHeeji YoonSeungryong KimJoon-Young Lee

[Paper] [Project Page] [GitHub] [Gradio Demo]

VideoMaMa (Video Mask-to-Matte Model) is a framework that converts coarse segmentation masks into pixel-accurate alpha mattes by leveraging pretrained video diffusion models. It demonstrates strong zero-shot generalization to real-world footage, even though it is trained solely on synthetic data.

Inference

To use VideoMaMa for inference, you can use the script provided in the official repository:

python inference_onestep_folder.py \
    --base_model_path "stabilityai/stable-video-diffusion-img2vid-xt" \
    --unet_checkpoint_path "SammyLim/VideoMaMa" \
    --image_root_path "/path/to/your/images" \
    --mask_root_path "/path/to/your/masks" \
    --output_dir "./output" \
    --keep_aspect_ratio 

License

The VideoMaMa model checkpoints (specifically unet/* and dino_projection_mlp.pth) are subject to the Stability AI Community License. By using this model, you agree to the terms outlined in the license agreement.

Citation

@article{lim2026videomama,
  title={VideoMaMa: Mask-Guided Video Matting via Generative Prior},
  author={Lim, Sangbeom and Oh, Seoung Wug and Huang, Jiahui and Yoon, Heeji and Kim, Seungryong and Lee, Joon-Young},
  journal={arXiv preprint arXiv:2601.14255},
  year={2026}
}