Improve model card and add metadata

Hi! I'm Niels from the Hugging Face community science team. I've noticed this repository is missing some key metadata and documentation.

This PR:
- Adds the `library_name: diffusers` tag based on the `config.json` and model architecture.
- Adds the `pipeline_tag: image-to-image`.
- Links the model to the original paper, project page, and GitHub repository.
- Adds a sample usage command derived from the official documentation.
- Updates the license information to be more specific to the underlying architecture.

Merging this will make the model more discoverable on the Hub and provide users with immediate context on how to use it.

Files changed (1) hide show

README.md +37 -2

README.md CHANGED Viewed

@@ -1,9 +1,44 @@
 ---
 license: other
 license_name: stabilityai-community-license
-license_link: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md
 ---
 ## License
-This model is licensed under the **Stability AI Community License**. By using this model, you agree to the terms outlined in the [license agreement](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md).

 ---
 license: other
 license_name: stabilityai-community-license
+license_link: https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/blob/main/LICENSE.md
+library_name: diffusers
+pipeline_tag: image-to-image
 ---
+# VideoMaMa: Mask-Guided Video Matting via Generative Prior
+[**Sangbeom Lim**](https://sites.google.com/view/sangbeomlim/home) · [**Seoung Wug Oh**](https://sites.google.com/view/seoungwugoh) · [**Jiahui Huang**](https://gabriel-huang.github.io/) · [**Heeji Yoon**](https://yoon-heez.github.io/) · [**Seungryong Kim**](https://cvlab.kaist.ac.kr/members/faculty) · [**Joon-Young Lee**](https://joonyoung-cv.github.io)
+[[Paper](https://huggingface.co/papers/2601.14255)] [[Project Page](https://cvlab-kaist.github.io/VideoMaMa/)] [[GitHub](https://github.com/cvlab-kaist/VideoMaMa)] [[Gradio Demo](https://huggingface.co/spaces/SammyLim/VideoMaMa)]
+VideoMaMa (Video Mask-to-Matte Model) is a framework that converts coarse segmentation masks into pixel-accurate alpha mattes by leveraging pretrained video diffusion models. It demonstrates strong zero-shot generalization to real-world footage, even though it is trained solely on synthetic data.
+## Inference
+To use VideoMaMa for inference, you can use the script provided in the [official repository](https://github.com/cvlab-kaist/VideoMaMa):
+```bash
+python inference_onestep_folder.py \
+    --base_model_path "stabilityai/stable-video-diffusion-img2vid-xt" \
+    --unet_checkpoint_path "SammyLim/VideoMaMa" \
+    --image_root_path "/path/to/your/images" \
+    --mask_root_path "/path/to/your/masks" \
+    --output_dir "./output" \
+    --keep_aspect_ratio
+```
 ## License
+The VideoMaMa model checkpoints (specifically `unet/*` and `dino_projection_mlp.pth`) are subject to the **Stability AI Community License**. By using this model, you agree to the terms outlined in the [license agreement](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid/blob/main/LICENSE.md).
+## Citation
+```bibtex
+@article{lim2026videomama,
+  title={VideoMaMa: Mask-Guided Video Matting via Generative Prior},
+  author={Lim, Sangbeom and Oh, Seoung Wug and Huang, Jiahui and Yoon, Heeji and Kim, Seungryong and Lee, Joon-Young},
+  journal={arXiv preprint arXiv:2601.14255},
+  year={2026}
+}
+```