Add model card for Kiwi-Edit
Browse filesHi! I'm Niels, part of the community science team at Hugging Face. I noticed this repository was missing a model card, so I've opened this PR to add one.
The model card includes:
- Metadata for the `diffusers` library and the `image-to-video` pipeline tag.
- Links to the original paper, project page, and GitHub repository.
- A brief description of the model's capabilities (instruction and reference-guided video editing).
- CLI usage instructions for running the model with Diffusers, based on the official repository.
This information helps users discover and use your work more effectively on the Hugging Face Hub.
README.md
ADDED
|
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: diffusers
|
| 3 |
+
pipeline_tag: image-to-video
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance
|
| 7 |
+
|
| 8 |
+
Kiwi-Edit is a versatile video editing framework built on an MLLM encoder and a video Diffusion Transformer (DiT). It supports both instruction-based video editing and reference-guided editing (using a reference image and instruction).
|
| 9 |
+
|
| 10 |
+
- **Paper:** [Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance](https://huggingface.co/papers/2603.02175)
|
| 11 |
+
- **Project Page:** [https://showlab.github.io/Kiwi-Edit/](https://showlab.github.io/Kiwi-Edit/)
|
| 12 |
+
- **Repository:** [https://github.com/showlab/Kiwi-Edit](https://github.com/showlab/Kiwi-Edit)
|
| 13 |
+
|
| 14 |
+
## Model Description
|
| 15 |
+
|
| 16 |
+
Kiwi-Edit introduces a unified editing architecture that synergizes learnable queries and latent visual features for reference semantic guidance. It addresses the challenge of precise visual control in instruction-based editing by allowing users to provide a reference image to guide the transformation. The framework achieves significant performance improvements in instruction following and reference fidelity through a scalable data generation pipeline and a multi-stage training curriculum.
|
| 17 |
+
|
| 18 |
+
## Usage
|
| 19 |
+
|
| 20 |
+
This model is compatible with the `diffusers` library. To run inference, follow the installation instructions in the [official repository](https://github.com/showlab/Kiwi-Edit).
|
| 21 |
+
|
| 22 |
+
### Quick Test with Diffusers
|
| 23 |
+
|
| 24 |
+
You can run a quick test on a demo video using the following command provided in the repository:
|
| 25 |
+
|
| 26 |
+
```bash
|
| 27 |
+
python diffusers_demo.py \
|
| 28 |
+
--video_path ./demo_data/video/source/0005e4ad9f49814db1d3f2296b911abf.mp4 \
|
| 29 |
+
--prompt "Remove the monkey." \
|
| 30 |
+
--save_path output.mp4 \
|
| 31 |
+
--model_path linyq/kiwi-edit-5b-instruct-only-diffusers
|
| 32 |
+
```
|
| 33 |
+
|
| 34 |
+
## Citation
|
| 35 |
+
|
| 36 |
+
If you find this work useful, please cite:
|
| 37 |
+
|
| 38 |
+
```bibtex
|
| 39 |
+
@misc{kiwiedit,
|
| 40 |
+
title={Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance},
|
| 41 |
+
author={Yiqi Lin and Guoqiang Liang and Ziyun Zeng and Zechen Bai and Yanzhe Chen and Mike Zheng Shou},
|
| 42 |
+
year={2026},
|
| 43 |
+
eprint={2603.02175},
|
| 44 |
+
archivePrefix={arXiv},
|
| 45 |
+
primaryClass={cs.CV},
|
| 46 |
+
url={https://arxiv.org/abs/2603.02175},
|
| 47 |
+
}
|
| 48 |
+
```
|