File size: 4,587 Bytes
5283d91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
license: cc-by-nc-4.0
pipeline_tag: image-to-image
library_name: diffusers
tags:
- image-generation
- image-inpainting
- reference-based-inpainting
- human-product-images
- lora
- hifi-inpaint
---

<h1 align="center" style="line-height: 50px;">
  HiFi-Inpaint: High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images
</h1>

<div align="center">
Yichen Liu<sup>1,*</sup>, Donghao Zhou<sup>2,*</sup>, Jie Wang<sup>3</sup>, Xin Gao<sup>3</sup>, Guisheng Liu<sup>3</sup>, Jiatong Li<sup>3,†</sup>, Quanwei Zhang<sup>4</sup>,<br>
Qiang Lyu<sup>1</sup>, Lanqing Guo<sup>5</sup>, Shilei Wen<sup>3,Β§</sup>, Weiqiang Wang<sup>1,Β§</sup>, Pheng-Ann Heng<sup>2,Β§</sup>
</div>

<br>

<div align="center">
<sup>1</sup>University of Chinese Academy of Sciences, <sup>2</sup>The Chinese University of Hong Kong, <sup>3</sup>ByteDance,<br>
<sup>4</sup>Zhejiang University, <sup>5</sup>UT Austin
</div>

<br>

<div align="center">
<sup>*</sup>Equal contribution, <sup>†</sup>Project lead, <sup>Β§</sup>Corresponding author
</div>

<br>

## 🌍 Useful Links

- Project Page: https://correr-zhou.github.io/HiFi-Inpaint/
- Paper: https://arxiv.org/pdf/2603.02210
- Code: https://github.com/Correr-Zhou/HiFi-Inpaint
- Training Dataset: https://huggingface.co/datasets/donghao-zhou/HP-Image-40K

---

## πŸ“Œ Model Summary

**HiFi-Inpaint** is a reference-based human-product image inpainting model for generating detail-preserving human-product images. Given a product reference image, a masked condition image, and a text prompt/caption, the model is designed to reconstruct the missing region while preserving fine-grained product appearance.

This repository contains the released model weights for HiFi-Inpaint, intended for research and model development on high-fidelity reference-guided inpainting.

## πŸ—‚οΈ Repository Files

```text
HiFi-Inpaint/
β”œβ”€β”€ README.md
β”œβ”€β”€ alpha_blocks.pt
└── pytorch_lora_weights.safetensors
```

- `pytorch_lora_weights.safetensors`: LoRA weights for the HiFi-Inpaint model.
- `alpha_blocks.pt`: auxiliary alpha-block weights used by the HiFi-Inpaint model pipeline.

## 🎯 Intended Uses

HiFi-Inpaint is intended for **research and model development** on reference-based human-product generation and inpainting. Typical use cases include:

- Product-reference-guided image inpainting.
- Generating detail-preserving human-product images.
- Fine-tuning or analyzing reference-conditioned image generation pipelines.
- Studying product appearance preservation under masked-image reconstruction settings.

This model is released as research weights and is not intended for deceptive, harmful, privacy-violating, or otherwise unlawful applications.

## πŸ’» How to Use

Please refer to the official code repository for installation, pipeline construction, and inference scripts:

https://github.com/Correr-Zhou/HiFi-Inpaint

A typical setup should download this repository's weights and load:

- `pytorch_lora_weights.safetensors` as the LoRA checkpoint.
- `alpha_blocks.pt` as the auxiliary alpha-block checkpoint required by the inference pipeline.

## πŸ“š Training Data

The model is associated with **HP-Image-40K**, a training dataset for high-fidelity reference-based human-product image inpainting. The dataset contains **43,632** aligned training samples with product reference images, ground-truth target images, masked condition images, binary masks, and captions.

Dataset repository: https://huggingface.co/datasets/donghao-zhou/HP-Image-40K

## βš–οΈ Usage Note

This model is released for **research and model development** purposes.

- Users should ensure that downstream use complies with the model license, dataset license, and applicable regulations.
- The model should not be used for deceptive, harmful, or privacy-violating applications.
- Generated outputs should be reviewed before public or commercial use.

## πŸ”— Citation

If you find this model useful in your research, please cite:

```bibtex
@article{liu2026hifiinpaint,
  title={HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images},
  author={Liu, Yichen and Zhou, Donghao and Wang, Jie and Gao, Xin and Liu, Guisheng and Li, Jiatong and Zhang, Quanwei and Lyu, Qiang and Guo, Lanqing and Wen, Shilei and Wang, Weiqiang and Heng, Pheng-Ann},
  journal={arXiv preprint arXiv:2603.02210},
  year={2026}
}
```

## πŸ“¬ Contact

For questions about the model or dataset, please contact Donghao Zhou: [dhzhou@link.cuhk.edu.hk](mailto:dhzhou@link.cuhk.edu.hk).