| | --- |
| | pipeline_tag: image-to-video |
| | library_name: diffusers |
| | license: mit |
| | --- |
| | |
| | # VIRES model card |
| |
|
| | **Model Page**: [VIRES](https://hjzheng.net/projects/VIRES/) |
| |
|
| | ## Model Information |
| |
|
| | Summary description and brief definition of inputs and outputs. |
| |
|
| | ### Description |
| |
|
| | VIRES is a video instance repainting method with sketch and text guidance, enabling video instance repainting, replacement, generation, and removal. It leverages the generative priors of text-to-video models to maintain temporal consistency and produce visually pleasing results. Key features include a Sequential ControlNet for structure layout extraction and detail capture, sketch attention for injecting fine-grained semantics, and a sketch-aware encoder for alignment. |
| |
|
| |
|
| | ### Inputs and outputs |
| |
|
| | - **Input:** |
| | - Text string describing the desired changes. |
| | - Mask Sequence (51 x 512 x 512 resolution). |
| | - Sketch Sequence (51 x 512 x 512 resolution). |
| |
|
| | - **Output:** |
| | - A repainted video. |
| |
|
| | ### Usage |
| |
|
| | A basic example using the `diffusers` library (requires appropriate model weights and dependencies): |
| |
|
| | ```python |
| | from diffusers import DiffusionPipeline #Import necessary libraries |
| | # Load the model (replace with your actual paths) |
| | pipe = DiffusionPipeline.from_pretrained("suimu/VIRES", torch_dtype=torch.float16).to("cuda") |
| | |
| | # Prepare inputs: text prompt, mask, and sketch |
| | prompt = "A cat replaces the dog in this video" |
| | mask = ... #Load your mask sequence |
| | sketch = ... #Load your sketch sequence |
| | |
| | # Generate the video |
| | video = pipe(prompt, mask, sketch).videos[0] |
| | |
| | # Save or display the video |
| | ... |
| | ``` |
| |
|
| | For complete usage instructions and advanced options, refer to our GitHub page: https://github.com/suimuc/VIRES/ |
| |
|
| |
|
| | ## Citation |
| |
|
| | ```BibTeX |
| | @article{vires, |
| | title={VIRES: Video Instance Repainting via Sketch and Text Guided Generation}, |
| | author={Weng, Shuchen and Zheng, Haojie and Zhang, Peixuan and Hong, Yuchen and Jiang, Han and Li, Si and Shi, Boxin}, |
| | journal={arXiv preprint arXiv:2411.16199}, |
| | year={2024} |
| | } |
| | ``` |