|
|
--- |
|
|
license: other |
|
|
license_name: stabilityai-community-license |
|
|
license_link: https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/blob/main/LICENSE.md |
|
|
library_name: diffusers |
|
|
pipeline_tag: image-to-image |
|
|
--- |
|
|
|
|
|
# VideoMaMa: Mask-Guided Video Matting via Generative Prior |
|
|
|
|
|
[**Sangbeom Lim**](https://sites.google.com/view/sangbeomlim/home) 路 [**Seoung Wug Oh**](https://sites.google.com/view/seoungwugoh) 路 [**Jiahui Huang**](https://gabriel-huang.github.io/) 路 [**Heeji Yoon**](https://yoon-heez.github.io/) 路 [**Seungryong Kim**](https://cvlab.kaist.ac.kr/members/faculty) 路 [**Joon-Young Lee**](https://joonyoung-cv.github.io) |
|
|
|
|
|
[[Paper](https://huggingface.co/papers/2601.14255)] [[Project Page](https://cvlab-kaist.github.io/VideoMaMa/)] [[GitHub](https://github.com/cvlab-kaist/VideoMaMa)] [[Gradio Demo](https://huggingface.co/spaces/SammyLim/VideoMaMa)] |
|
|
|
|
|
VideoMaMa (Video Mask-to-Matte Model) is a framework that converts coarse segmentation masks into pixel-accurate alpha mattes by leveraging pretrained video diffusion models. It demonstrates strong zero-shot generalization to real-world footage, even though it is trained solely on synthetic data. |
|
|
|
|
|
## Inference |
|
|
|
|
|
To use VideoMaMa for inference, you can use the script provided in the [official repository](https://github.com/cvlab-kaist/VideoMaMa): |
|
|
|
|
|
```bash |
|
|
python inference_onestep_folder.py \ |
|
|
--base_model_path "stabilityai/stable-video-diffusion-img2vid-xt" \ |
|
|
--unet_checkpoint_path "SammyLim/VideoMaMa" \ |
|
|
--image_root_path "/path/to/your/images" \ |
|
|
--mask_root_path "/path/to/your/masks" \ |
|
|
--output_dir "./output" \ |
|
|
--keep_aspect_ratio |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
The VideoMaMa model checkpoints (specifically `unet/*` and `dino_projection_mlp.pth`) are subject to the **Stability AI Community License**. By using this model, you agree to the terms outlined in the [license agreement](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/blob/main/LICENSE.md). |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{lim2026videomama, |
|
|
title={VideoMaMa: Mask-Guided Video Matting via Generative Prior}, |
|
|
author={Lim, Sangbeom and Oh, Seoung Wug and Huang, Jiahui and Yoon, Heeji and Kim, Seungryong and Lee, Joon-Young}, |
|
|
journal={arXiv preprint arXiv:2601.14255}, |
|
|
year={2026} |
|
|
} |
|
|
``` |