X2Edit / README.md

Upload README.md

0274f57 verified 5 months ago

3.84 kB

	<div align="center">
	<h1>X2Edit</h1>
	<a href='https://arxiv.org/abs/2508.07607'><img src='https://img.shields.io/badge/arXiv-2508.07607-b31b1b.svg'></a>
	<a href='https://huggingface.co/datasets/OPPOer/X2Edit-Dataset'><img src='https://img.shields.io/badge/🤗%20HuggingFace-X2Edit Dataset-ffd21f.svg'></a>
	<a href='https://huggingface.co/OPPOer/X2Edit'><img src='https://img.shields.io/badge/🤗%20HuggingFace-X2Edit-ffd21f.svg'></a>
	<a href='https://www.modelscope.cn/datasets/AIGCer-OPPO/X2Edit-Dataset'><img src='https://img.shields.io/badge/🤖%20ModelScope-X2Edit Dataset-purple.svg'></a>
	</div>

	## Environment

	Prepare the environment, install the required libraries:

	```shell
	$ cd X2Edit
	$ conda create --name X2Edit python==3.11
	$ conda activate X2Edit
	$ pip install -r requirements.txt
	```

	## Inference
	We provides inference scripts for editing images with resolutions of 1024 and 512. In addition, we can choose the base model of X2Edit, including [FLUX.1-Krea](https://huggingface.co/black-forest-labs/FLUX.1-Krea-dev), [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev), [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell), [PixelWave](https://huggingface.co/mikeyandfriends/PixelWave_FLUX.1-dev_03), [shuttle-3-diffusion](https://huggingface.co/shuttleai/shuttle-3-diffusion), and choose the LoRA for integration with MoE-LoRA including [Turbo-Alpha](https://huggingface.co/alimama-creative/FLUX.1-Turbo-Alpha), [AntiBlur](https://huggingface.co/Shakker-Labs/FLUX.1-dev-LoRA-AntiBlur), [Midjourney-Mix2](https://huggingface.co/strangerzonehf/Flux-Midjourney-Mix2-LoRA), [Super-Realism](https://huggingface.co/strangerzonehf/Flux-Super-Realism-LoRA), [Chatgpt-Ghibli](https://huggingface.co/openfree/flux-chatgpt-ghibli-lora). Choose the model you like and download it. For the MoE-LoRA, we will open source a unified checkpoint that can be used for both 512 and 1024 resolutions.

	Before executing the script, download [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) to select the task type for the input instruction, base model(FLUX.1-Krea, FLUX.1-dev, FLUX.1-schnell, shuttle-3-diffusion), [MLLM](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) and [Alignet](https://huggingface.co/OPPOer/X2I/blob/main/qwen2.5-vl-7b_proj.pt). All scripts follow analogous command patterns. Simply replace the script filename while maintaining consistent parameter configurations.

	```shell
	$ python infer.py --device cuda --pixel 1024 --num_experts 12 --base_path BASE_PATH --qwen_path QWEN_PATH --lora_path LORA_PATH --extra_lora_path EXTRA_LORA_PATH
	```

	device: The device used for inference. default: `cuda`<br>
	pixel: The resolution of the input image, , you can choose from [512, 1024]. default: `1024`<br>
	num_experts: The number of expert in MoE. default: `12`<br>
	base_path: The path of base model.<br>
	qwen_path: The path of model used to select the task type for the input instruction. We use Qwen3-8B here.<br>
	lora_path: The path of MoE-LoRA in X2Edit.<br>
	extra_lora_path: The path of extra LoRA for plug-and-play. default: `None`.<br>

	## Citation

	🌟 If you find our work helpful, please consider citing our paper and leaving valuable stars

	```
	@misc{ma2025x2editrevisitingarbitraryinstructionimage,
	title={X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning},
	author={Jian Ma and Xujie Zhu and Zihao Pan and Qirong Peng and Xu Guo and Chen Chen and Haonan Lu},
	year={2025},
	eprint={2508.07607},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2508.07607},
	}
	```