HiFi-Inpaint: High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images

Yichen Liu^1,*, Donghao Zhou^2,*, Jie Wang³, Xin Gao³, Guisheng Liu³, Jiatong Li^3,†, Quanwei Zhang⁴,
Qiang Lyu¹, Lanqing Guo⁵, Shilei Wen^3,§, Weiqiang Wang^1,§, Pheng-Ann Heng^2,§

¹University of Chinese Academy of Sciences, ²The Chinese University of Hong Kong, ³ByteDance,
⁴Zhejiang University, ⁵UT Austin

^*Equal contribution, ^†Project lead, ^§Corresponding author

🌍 Useful Links

Project Page: https://correr-zhou.github.io/HiFi-Inpaint/
Paper: https://arxiv.org/pdf/2603.02210
Code: https://github.com/Correr-Zhou/HiFi-Inpaint
Training Dataset: https://huggingface.co/datasets/donghao-zhou/HP-Image-40K

📌 Model Summary

HiFi-Inpaint is a reference-based human-product image inpainting model for generating detail-preserving human-product images. Given a product reference image, a masked condition image, and a text prompt/caption, the model is designed to reconstruct the missing region while preserving fine-grained product appearance.

This repository contains the released model weights for HiFi-Inpaint, intended for research and model development on high-fidelity reference-guided inpainting.

🗂️ Repository Files

HiFi-Inpaint/
├── README.md
├── alpha_blocks.pt
└── pytorch_lora_weights.safetensors

pytorch_lora_weights.safetensors: LoRA weights for the HiFi-Inpaint model.
alpha_blocks.pt: auxiliary alpha-block weights used by the HiFi-Inpaint model pipeline.

🎯 Intended Uses

HiFi-Inpaint is intended for research and model development on reference-based human-product generation and inpainting. Typical use cases include:

Product-reference-guided image inpainting.
Generating detail-preserving human-product images.
Fine-tuning or analyzing reference-conditioned image generation pipelines.
Studying product appearance preservation under masked-image reconstruction settings.

This model is released as research weights and is not intended for deceptive, harmful, privacy-violating, or otherwise unlawful applications.

💻 How to Use

Please refer to the official code repository for installation, pipeline construction, and inference scripts:

https://github.com/Correr-Zhou/HiFi-Inpaint

A typical setup should download this repository's weights and load:

pytorch_lora_weights.safetensors as the LoRA checkpoint.
alpha_blocks.pt as the auxiliary alpha-block checkpoint required by the inference pipeline.

📚 Training Data

The model is associated with HP-Image-40K, a training dataset for high-fidelity reference-based human-product image inpainting. The dataset contains 43,632 aligned training samples with product reference images, ground-truth target images, masked condition images, binary masks, and captions.

Dataset repository: https://huggingface.co/datasets/donghao-zhou/HP-Image-40K

⚖️ Usage Note

This model is released for research and model development purposes.

Users should ensure that downstream use complies with the model license, dataset license, and applicable regulations.
The model should not be used for deceptive, harmful, or privacy-violating applications.
Generated outputs should be reviewed before public or commercial use.

🔗 Citation

If you find this model useful in your research, please cite:

@article{liu2026hifiinpaint,
  title={HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images},
  author={Liu, Yichen and Zhou, Donghao and Wang, Jie and Gao, Xin and Liu, Guisheng and Li, Jiatong and Zhang, Quanwei and Lyu, Qiang and Guo, Lanqing and Wen, Shilei and Wang, Weiqiang and Heng, Pheng-Ann},
  journal={arXiv preprint arXiv:2603.02210},
  year={2026}
}

📬 Contact

For questions about the model or dataset, please contact Donghao Zhou: dhzhou@link.cuhk.edu.hk.

Downloads last month: 38

Paper for donghao-zhou/HiFi-Inpaint

HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images

Paper • 2603.02210 • Published Mar 2 • 30