Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Paper • 2401.10226 • Published • 2
How to use jianzongwu/lgvi with Diffusers:
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("jianzongwu/lgvi", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]The LGVI model is trained on ROVI and Inst-Inpaint for the referring inpainting task. Please check our project page for more details.
@article{wu2024lgvi,
title={Towards language-driven video inpainting via multimodal large language models},
author={Wu, Jianzong and Li, Xiangtai and Si, Chenyang and Zhou, Shangchen and Yang, Jingkang and Zhang, Jiangning and Li, Yining and Chen, Kai and Tong, Yunhai and Liu, Ziwei and others},
journal={arXiv preprint arXiv:2401.10226},
year={2024}
}