SammyLim
/

VideoMaMa

Model card Files Files and versions

VideoMaMa / README.md

nielsr's picture

nielsr HF Staff

Improve model card and add metadata

3e0448a verified 14 days ago

|

2.27 kB

	---
	license: other
	license_name: stabilityai-community-license
	license_link: https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/blob/main/LICENSE.md
	library_name: diffusers
	pipeline_tag: image-to-image
	---

	# VideoMaMa: Mask-Guided Video Matting via Generative Prior

	[Sangbeom Lim](https://sites.google.com/view/sangbeomlim/home) · [Seoung Wug Oh](https://sites.google.com/view/seoungwugoh) · [Jiahui Huang](https://gabriel-huang.github.io/) · [Heeji Yoon](https://yoon-heez.github.io/) · [Seungryong Kim](https://cvlab.kaist.ac.kr/members/faculty) · [Joon-Young Lee](https://joonyoung-cv.github.io)

	[[Paper](https://huggingface.co/papers/2601.14255)] [[Project Page](https://cvlab-kaist.github.io/VideoMaMa/)] [[GitHub](https://github.com/cvlab-kaist/VideoMaMa)] [[Gradio Demo](https://huggingface.co/spaces/SammyLim/VideoMaMa)]

	VideoMaMa (Video Mask-to-Matte Model) is a framework that converts coarse segmentation masks into pixel-accurate alpha mattes by leveraging pretrained video diffusion models. It demonstrates strong zero-shot generalization to real-world footage, even though it is trained solely on synthetic data.

	## Inference

	To use VideoMaMa for inference, you can use the script provided in the [official repository](https://github.com/cvlab-kaist/VideoMaMa):

	```bash
	python inference_onestep_folder.py \
	--base_model_path "stabilityai/stable-video-diffusion-img2vid-xt" \
	--unet_checkpoint_path "SammyLim/VideoMaMa" \
	--image_root_path "/path/to/your/images" \
	--mask_root_path "/path/to/your/masks" \
	--output_dir "./output" \
	--keep_aspect_ratio
	```

	## License

	The VideoMaMa model checkpoints (specifically `unet/` and `dino_projection_mlp.pth`) are subject to the Stability AI Community License*. By using this model, you agree to the terms outlined in the [license agreement](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/blob/main/LICENSE.md).

	## Citation

	```bibtex
	@article{lim2026videomama,
	title={VideoMaMa: Mask-Guided Video Matting via Generative Prior},
	author={Lim, Sangbeom and Oh, Seoung Wug and Huang, Jiahui and Yoon, Heeji and Kim, Seungryong and Lee, Joon-Young},
	journal={arXiv preprint arXiv:2601.14255},
	year={2026}
	}
	```