lzyhha commited on
Commit
4c8cb1c
·
verified ·
1 Parent(s): 528c98a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +187 -4
README.md CHANGED
@@ -1,4 +1,187 @@
1
- ---
2
- license: apache-2.0
3
- library_name: diffusers
4
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: diffusers
4
+ datasets:
5
+ - VisualCloze/Graph200K
6
+ base_model:
7
+ - black-forest-labs/FLUX.1-Fill-dev
8
+ pipeline_tag: image-to-image
9
+ tags:
10
+ - text-to-image
11
+ - image-to-image
12
+ - flux
13
+ - lora
14
+ - in-context-learning
15
+ - universal-image-generation
16
+ - ai-tools
17
+ ---
18
+
19
+
20
+ # VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning (Implementation with <strong><span style="color:red">Diffusers</span></strong>)
21
+
22
+ **Note**: <strong><span style="color:hotpink">You still need to install our modified version of</span></strong> [<strong><span style="color:hotpink">diffusers</span></strong>](https://github.com/lzyhha/diffusers).
23
+
24
+ <div align="center">
25
+
26
+ [[Paper](https://arxiv.org/abs/2504.07960)] &emsp; [[Project Page](https://visualcloze.github.io/)] &emsp; [[Github](https://github.com/lzyhha/VisualCloze)]
27
+
28
+ </div>
29
+
30
+ <div align="center">
31
+
32
+ [[🤗 <strong><span style="color:hotpink">Diffusers</span></strong> Implementation](https://github.com/lzyhha/diffusers)]
33
+
34
+ </div>
35
+
36
+ <div align="center">
37
+
38
+ [[🤗 Online Demo](https://huggingface.co/spaces/VisualCloze/VisualCloze)] &emsp; [[🤗 Dataset Card](https://huggingface.co/datasets/VisualCloze/Graph200K)]
39
+
40
+ </div>
41
+
42
+
43
+ ## 🌠 Key Features
44
+
45
+ An in-context learning based universal image generation framework.
46
+
47
+ 1. Support various in-domain tasks.
48
+ 2. Generalize to <strong><span style="color:hotpink"> unseen tasks</span></strong> through in-context learning.
49
+ 3. Unify multiple tasks into one step and generate both target image and intermediate results.
50
+ 4. Support reverse-engineering a set of conditions from a target image.
51
+
52
+ 🔥 Examples are shown in the [project page](https://visualcloze.github.io/).
53
+
54
+ ## 🔧 Installation
55
+
56
+ Install diffusers from our modified repository.
57
+ ```bash
58
+ git clone https://github.com/lzyhha/diffusers
59
+
60
+ cd diffusers
61
+ pip install -v -e .
62
+ ```
63
+
64
+ ### 💻 Diffusers Usage
65
+
66
+ [![Huggingface VisualCloze](https://img.shields.io/static/v1?label=Demo&message=Huggingface%20Gradio&color=orange)](https://huggingface.co/spaces/VisualCloze/VisualCloze)
67
+
68
+ Example with Depth-to-Image:
69
+
70
+ <img src="./visualcloze_diffusers_example_depthtoimage.jpg" width="60%" height="50%" alt="Example with Depth-to-Image"/>
71
+
72
+ ```python
73
+ import torch
74
+ from diffusers import VisualClozePipeline
75
+ from diffusers.utils import load_image
76
+
77
+
78
+ # Load in-context images (make sure the paths are correct and accessible)
79
+ image_paths = [
80
+ # in-context examples
81
+ [
82
+ load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/93bc1c43af2d6c91ac2fc966bf7725a2/93bc1c43af2d6c91ac2fc966bf7725a2_depth-anything-v2_Large.jpg'),
83
+ load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/93bc1c43af2d6c91ac2fc966bf7725a2/93bc1c43af2d6c91ac2fc966bf7725a2.jpg'),
84
+ ],
85
+ # query with the target image
86
+ [
87
+ load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/79f2ee632f1be3ad64210a641c4e201b/79f2ee632f1be3ad64210a641c4e201b_depth-anything-v2_Large.jpg'),
88
+ None, # No image needed for the query in this case
89
+ ],
90
+ ]
91
+
92
+ # Task and content prompt
93
+ task_prompt = "Each row outlines a logical process, starting from [IMAGE1] gray-based depth map with detailed object contours, to achieve [IMAGE2] an image with flawless clarity."
94
+ content_prompt = """A serene portrait of a young woman with long dark hair, wearing a beige dress with intricate
95
+ gold embroidery, standing in a softly lit room. She holds a large bouquet of pale pink roses in a black box,
96
+ positioned in the center of the frame. The background features a tall green plant to the left and a framed artwork
97
+ on the wall to the right. A window on the left allows natural light to gently illuminate the scene.
98
+ The woman gazes down at the bouquet with a calm expression. Soft natural lighting, warm color palette,
99
+ high contrast, photorealistic, intimate, elegant, visually balanced, serene atmosphere."""
100
+
101
+ # Load the VisualClozePipeline
102
+ pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-512", resolution=512, torch_dtype=torch.bfloat16)
103
+ pipe.enable_model_cpu_offload() # Save some VRAM by offloading the model to CPU
104
+
105
+ # Run the pipeline
106
+ image_result = pipe(
107
+ task_prompt=task_prompt,
108
+ content_prompt=content_prompt,
109
+ image=image_paths,
110
+ upsampling_width=1024,
111
+ upsampling_height=1024,
112
+ upsampling_strength=0.4,
113
+ guidance_scale=30,
114
+ num_inference_steps=30,
115
+ max_sequence_length=512,
116
+ generator=torch.Generator("cpu").manual_seed(0)
117
+ ).images[0][0]
118
+
119
+ # Save the resulting image
120
+ image_result.save("visualcloze.png")
121
+ ```
122
+
123
+
124
+ Example with Virtual Try-On:
125
+
126
+ <img src="./visualcloze_diffusers_example_tryon.jpg" width="60%" height="50%" alt="Example with Virtual Try-On"/>
127
+
128
+ ```python
129
+ import torch
130
+ from diffusers import VisualClozePipeline
131
+ from diffusers.utils import load_image
132
+
133
+
134
+ # Load in-context images (make sure the paths are correct and accessible)
135
+ image_paths = [
136
+ # in-context examples
137
+ [
138
+ load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/00700_00.jpg'),
139
+ load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/03673_00.jpg'),
140
+ load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/00700_00_tryon_catvton_0.jpg'),
141
+ ],
142
+ # query with the target image
143
+ [
144
+ load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/00555_00.jpg'),
145
+ load_image('https://github.com/lzyhha/VisualCloze/raw/main/examples/examples/tryon/12265_00.jpg'),
146
+ None
147
+ ],
148
+ ]
149
+
150
+ # Task and content prompt
151
+ task_prompt = "Each row shows a virtual try-on process that aims to put [IMAGE2] the clothing onto [IMAGE1] the person, producing [IMAGE3] the person wearing the new clothing."
152
+ content_prompt = None
153
+
154
+ # Load the VisualClozePipeline
155
+ pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-512", resolution=512, torch_dtype=torch.bfloat16)
156
+ pipe.enable_model_cpu_offload() # Save some VRAM by offloading the model to CPU
157
+
158
+ # Run the pipeline
159
+ image_result = pipe(
160
+ task_prompt=task_prompt,
161
+ content_prompt=content_prompt,
162
+ image=image_paths,
163
+ upsampling_height=1632,
164
+ upsampling_width=1232,
165
+ upsampling_strength=0.3,
166
+ guidance_scale=30,
167
+ num_inference_steps=30,
168
+ max_sequence_length=512,
169
+ generator=torch.Generator("cpu").manual_seed(0)
170
+ ).images[0][0]
171
+
172
+ # Save the resulting image
173
+ image_result.save("visualcloze.png")
174
+ ```
175
+
176
+ ### Citation
177
+
178
+ If you find VisualCloze useful for your research and applications, please cite using this BibTeX:
179
+
180
+ ```bibtex
181
+ @article{li2025visualcloze,
182
+ title={VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning},
183
+ author={Li, Zhong-Yu and Du, Ruoyi and Yan, Juncheng and Zhuo, Le and Li, Zhen and Gao, Peng and Ma, Zhanyu and Cheng, Ming-Ming},
184
+ journal={arXiv preprint arXiv:2504.07960},
185
+ year={2025}
186
+ }
187
+ ```