Mirror linyq/kiwi-edit-5b-reference-only-diffusers

7a50439 verified about 2 months ago

1.89 kB

	---
	library_name: diffusers
	pipeline_tag: image-to-video
	---

	# Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

	Kiwi-Edit is a versatile video editing framework built on an MLLM encoder and a video Diffusion Transformer (DiT). It supports:
	- Instruction Video Editing: Modify video content through text prompts.
	- Reference Image Guidance: Use a reference image to guide editing for higher visual fidelity and precise control.

	The model synergizes learnable queries and latent visual features for reference semantic guidance, achieving significant gains in instruction following and reference fidelity.

	- Project Page: [https://showlab.github.io/Kiwi-Edit/](https://showlab.github.io/Kiwi-Edit/)
	- Paper: [Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance](https://huggingface.co/papers/2603.02175)
	- Repository: [https://github.com/showlab/Kiwi-Edit](https://github.com/showlab/Kiwi-Edit)

	## Usage

	To use Kiwi-Edit for inference, follow the installation instructions in the [official repository](https://github.com/showlab/Kiwi-Edit). You can run a quick test on a demo video using the following command:

	```bash
	python diffusers_demo.py \
	--video_path ./demo_data/video/source/0005e4ad9f49814db1d3f2296b911abf.mp4 \
	--prompt "Remove the monkey." \
	--save_path output.mp4 --model_path linyq/kiwi-edit-5b-instruct-only-diffusers
	```

	## Citation

	If you use Kiwi-Edit in your research, please cite the following work:

	```bibtex
	@misc{kiwiedit,
	title={Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance},
	author={Yiqi Lin and Guoqiang Liang and Ziyun Zeng and Zechen Bai and Yanzhe Chen and Mike Zheng Shou},
	year={2026},
	eprint={2603.02175},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2603.02175},
	}
	```