Update README.md

4124617 verified about 1 month ago

4.84 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- image-restoration
	- all-in-one
	- diffusion
	- flow-matching
	- mllm
	- flux
	- qwen2.5-vl
	- siglip2
	- low-level-vision
	pipeline_tag: image-to-image
	---

	<p align="center">
	<img src="https://raw.githubusercontent.com/Programmergg/FAPE-IR/main/figs/logo.png" width="120">
	</p>

	# FAPEIR_Uniworld — Initial Weights for FAPE-IR

	Initial weights for FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration.

	> [📄 Paper (arXiv 2511.14099)](https://arxiv.org/abs/2511.14099) &emsp;
	> [💻 Code](https://github.com/Programmergg/FAPE-IR) &emsp;
	> [🏋️ Trainset](https://huggingface.co/datasets/David0219/FAPE-IR-Training) &emsp;
	> [🧪 Testset](https://huggingface.co/datasets/David0219/FAPE-IR-Testing)

	---

	## 💡 What This Repo Is

	This repository releases the initial weights required to start training FAPE-IR — i.e. all pretrained components consumed by the YAML config

	```
	scripts/denoiser/flux_qwen2p5vl_7b_vlm_512.yaml
	```

	in the FAPE-IR codebase. Concretely it bundles:

	* the UniWorld-V1 initialization (Qwen2.5-VL-7B-Instruct + FLUX.1-dev re-organized weights),
	* the SigLIP-v2 encoder used by the executor,
	* a small set of projection / connector weights (`mlp2`, `mlp3`, SigLIP→FLUX redux),
	* a VGG checkpoint used by the LPIPS loss.

	> ⚠️ This is NOT the post-training checkpoint reported in the paper.
	---

	## 📂 File Layout

	After downloading, the repository is meant to be placed under `FAPE-IR/weights/` exactly as below:

	```text
	weights/
	├── flux/ # FLUX.1-dev backbone (re-organized)
	├── siglip/ # SigLIP-v2 encoder
	├── uniworld/ # UniWorld-V1 (Qwen2.5-VL-7B-Instruct + denoiser projection)
	├── denoise_projector_params.bin # planner-token → denoiser projector (mlp2)
	├── flux-redux-siglipv2-512.bin # SigLIP-v2 → FLUX redux projector
	├── vae_projector_only.bin # VAE high/low-frequency projector (mlp3)
	└── vgg.pth # VGG weights for LPIPS loss
	```

	These names match one-to-one with the fields of the YAML config:

	```yaml
	model_config:
	pretrained_lvlm_name_or_path: weights/uniworld
	pretrained_denoiser_name_or_path: weights/flux
	pretrained_siglip_name_or_path: weights/siglip
	pretrained_mlp2_path: weights/denoise_projector_params.bin
	pretrained_mlp3_path: weights/vae_projector_only.bin
	pretrained_siglip_mlp_path: weights/flux-redux-siglipv2-512.bin

	training_config:
	lpips_weights_path: weights/vgg.pth
	```

	If you change the layout, remember to update the YAML accordingly.

	---

	## ⬇️ Download

	```bash
	# inside the FAPE-IR project root
	mkdir -p weights
	huggingface-cli download David0219/FAPEIR_Uniworld --local-dir ./weights
	```

	Or in Python:

	```python
	from huggingface_hub import snapshot_download
	snapshot_download(
	repo_id="David0219/FAPEIR_Uniworld",
	local_dir="./weights",
	local_dir_use_symlinks=False,
	)
	```

	---

	## 📝 Intended Use & Limitations

	Intended use. Research on All-in-One image restoration with an MLLM-as-planner + diffusion-as-executor paradigm; reproducing or extending FAPE-IR; ablating individual components (LoRA-MoE routing, frequency regularization, adversarial training).

	Limitations.

	* Training requires substantial GPU memory because the executor is FLUX.1-dev (12B-class) and the planner is Qwen2.5-VL-7B-Instruct.
	* These are initial weights only — running inference with them directly will not reproduce FAPE-IR's reported quality. Train first.
	* The base models (FLUX.1-dev, Qwen2.5-VL, SigLIP-v2) keep their original licenses; in particular FLUX.1-dev is non-commercial. Users must comply with each license individually.

	---

	## 🔖 Citation

	```bibtex
	@article{liu2025fape,
	title = {FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration},
	author = {Liu, Jingren and Xu, Shuning and Yang, Qirui and Wang, Yun and Chen, Xiangyu and Ji, Zhong},
	journal = {arXiv preprint arXiv:2511.14099},
	year = {2025}
	}
	```

	---

	## 📜 License & Acknowledgement

	Apache-2.0 for the connector / projector weights released here. The bundled UniWorld-V1, FLUX.1-dev, Qwen2.5-VL-7B-Instruct, SigLIP-v2 and VGG weights retain their original licenses, which users must respect.

	We thank the teams behind [UniWorld](https://github.com/PKU-YuanGroup/UniWorld), [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev), [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL), and [SigLIP-v2](https://huggingface.co/google/siglip2-so400m-patch14-384) for open-sourcing their work.