lzyhha commited on
Commit
053c3cd
·
verified ·
1 Parent(s): f2d9365

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +179 -4
README.md CHANGED
@@ -1,4 +1,179 @@
1
- ---
2
- license: apache-2.0
3
- library_name: diffusers
4
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: diffusers
4
+ datasets:
5
+ - VisualCloze/Graph200K
6
+ base_model:
7
+ - black-forest-labs/FLUX.1-Fill-dev
8
+ pipeline_tag: image-to-image
9
+ tags:
10
+ - text-to-image
11
+ - image-to-image
12
+ - flux
13
+ - lora
14
+ - in-context-learning
15
+ - universal-image-generation
16
+ - ai-tools
17
+ ---
18
+
19
+
20
+ # VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning (Implementation with <strong><span style="color:red">Diffusers</span></strong>)
21
+
22
+ <div align="center">
23
+
24
+ [[Paper](https://arxiv.org/abs/2504.07960)] &emsp; [[Project Page](https://visualcloze.github.io/)] &emsp; [[Github](https://github.com/lzyhha/VisualCloze)] &emsp;
25
+
26
+ </div>
27
+
28
+ <div align="center">
29
+
30
+ [[🤗 <strong><span style="color:hotpink">Diffusers</span></strong> Implementation](https://github.com/lzyhha/diffusers)]
31
+
32
+ </div>
33
+
34
+ <div align="center">
35
+
36
+ [[🤗 Online Demo](https://huggingface.co/spaces/VisualCloze/VisualCloze)] &emsp; [[🤗 Dataset Card](https://huggingface.co/datasets/VisualCloze/Graph200K)]
37
+
38
+ </div>
39
+
40
+
41
+ ## 🌠 Key Features:
42
+
43
+ An in-context learning based universal image generation framework.
44
+
45
+ 1. Support various in-domain tasks.
46
+ 2. Generalize to <strong><span style="color:hotpink"> unseen tasks</span></strong> through in-context learning.
47
+ 3. Unify multiple tasks into one step and generate both target image and intermediate results.
48
+ 4. Support reverse-engineering a set of conditions from a target image.
49
+
50
+ 🔥 Examples are shown in the [project page](https://visualcloze.github.io/).
51
+
52
+ ## 🔧 Installation
53
+
54
+ Install diffusers from our modified repository.
55
+ ```bash
56
+ git clone https://github.com/lzyhha/diffusers
57
+
58
+ cd diffusers
59
+ pip install -v -e .
60
+ ```
61
+
62
+ ### 💻 Diffusers Usage
63
+
64
+ [![Huggingface VisualCloze](https://img.shields.io/static/v1?label=Demo&message=Huggingface%20Gradio&color=orange)](https://huggingface.co/spaces/VisualCloze/VisualCloze)
65
+
66
+ Example with Depth-to-Image:
67
+ ```python
68
+ import torch
69
+ from diffusers import VisualClozePipeline
70
+ from diffusers.utils import load_image
71
+ from PIL import Image
72
+
73
+ # Load in-context images (make sure the paths are correct and accessible)
74
+ image_paths = [
75
+ # in-context examples
76
+ [
77
+ load_image('https://github.com/lzyhha/VisualCloze/tree/main/examples/examples/5bf755ed9dbb9b3e223e7ba35232b06e/5bf755ed9dbb9b3e223e7ba35232b06e_depth-anything-v2_Large.jpg'),
78
+ load_image('https://github.com/lzyhha/VisualCloze/tree/main/examples/examples/5bf755ed9dbb9b3e223e7ba35232b06e/5bf755ed9dbb9b3e223e7ba35232b06e.jpg'),
79
+ ],
80
+ # query with the target image
81
+ [
82
+ load_image('https://github.com/lzyhha/VisualCloze/tree/main/examples/examples/2b74476568f7562a6aa832d423132ed3/2b74476568f7562a6aa832d423132ed3_depth-anything-v2_Large.jpg'),
83
+ None, # No image needed for the query in this case
84
+ ],
85
+ ]
86
+
87
+ # Task and content prompt
88
+ task_prompt = "Each row outlines a logical process, starting from [IMAGE1] gray-based depth map with detailed object contours, to achieve [IMAGE2] an image with flawless clarity."
89
+ content_prompt = """Group photo of five young adults enjoying a rooftop gathering at dusk. The group is positioned in the center, with three women and two men smiling and embracing.
90
+ The woman on the far left wears a floral top and holds a drink, looking slightly to the right.
91
+ Next to her, a woman in a denim jacket stands close to a woman in a white blouse, both smiling directly at the camera.
92
+ The fourth woman, in an orange top, stands close to the man on the far right, who wears a red shirt and blue blazer, smiling broadly.
93
+ The background features a cityscape with a tall building and string lights hanging overhead, creating a warm, festive atmosphere.
94
+ Soft natural lighting, warm color palette, shallow depth of field, intimate and joyful mood, slightly blurred background, urban rooftop setting, evening ambiance."""
95
+
96
+ # Load the VisualClozePipeline
97
+ pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-384", torch_dtype=torch.bfloat16)
98
+ pipe.enable_model_cpu_offload() # Save some VRAM by offloading the model to CPU
99
+
100
+ # Run the pipeline
101
+ image_result = pipe(
102
+ task_prompt=task_prompt,
103
+ content_prompt=content_prompt,
104
+ image=image_paths,
105
+ height=1632,
106
+ width=1232,
107
+ upsampling_strength=0.4,
108
+ guidance_scale=30,
109
+ num_inference_steps=50,
110
+ max_sequence_length=512,
111
+ generator=torch.Generator("cpu").manual_seed(0)
112
+ ).images[0]
113
+
114
+ # Save the resulting image
115
+ image_result.save("visualcloze.png")
116
+ ```
117
+
118
+
119
+ Example with Virtual Try-On:
120
+ ```python
121
+ import torch
122
+ from diffusers import VisualClozePipeline
123
+ from diffusers.utils import load_image
124
+ from PIL import Image
125
+
126
+ # Load in-context images (make sure the paths are correct and accessible)
127
+ image_paths = [
128
+ # in-context examples
129
+ [
130
+ load_image('https://github.com/lzyhha/VisualCloze/tree/main/examples/examples/tryon/00700_00.jpg'),
131
+ load_image('https://github.com/lzyhha/VisualCloze/tree/main/examples/examples/tryon/03673_00.jpg'),
132
+ load_image('https://github.com/lzyhha/VisualCloze/tree/main/examples/examples/tryon/00700_00_tryon_catvton_0.jpg'),
133
+ ],
134
+ # query with the target image
135
+ [
136
+ load_image('https://github.com/lzyhha/VisualCloze/tree/main/examples/examples/tryon/00555_00.jpg'),
137
+ load_image('https://github.com/lzyhha/VisualCloze/tree/main/examples/examples/tryon/12265_00.jpg'),
138
+ None
139
+ ],
140
+ ]
141
+
142
+ # Task and content prompt
143
+ task_prompt = "Each row shows a virtual try-on process that aims to put [IMAGE2] the clothing onto [IMAGE1] the person, producing [IMAGE3] the person wearing the new clothing."
144
+ content_prompt = None
145
+
146
+ # Load the VisualClozePipeline
147
+ pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-384", torch_dtype=torch.bfloat16)
148
+ pipe.enable_model_cpu_offload() # Save some VRAM by offloading the model to CPU
149
+
150
+ # Run the pipeline
151
+ image_result = pipe(
152
+ task_prompt=task_prompt,
153
+ content_prompt=content_prompt,
154
+ image=image_paths,
155
+ height=1632,
156
+ width=1232,
157
+ upsampling_strength=0.4,
158
+ guidance_scale=30,
159
+ num_inference_steps=50,
160
+ max_sequence_length=512,
161
+ generator=torch.Generator("cpu").manual_seed(0)
162
+ ).images[0]
163
+
164
+ # Save the resulting image
165
+ image_result.save("visualcloze.png")
166
+ ```
167
+
168
+ ### Citation
169
+
170
+ If you find VisualCloze useful for your research and applications, please cite using this BibTeX:
171
+
172
+ ```bibtex
173
+ @article{li2025visualcloze,
174
+ title={VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning},
175
+ author={Li, Zhong-Yu and Du, Ruoyi and Yan, Juncheng and Zhuo, Le and Li, Zhen and Gao, Peng and Ma, Zhanyu and Cheng, Ming-Ming},
176
+ journal={arXiv preprint arXiv:2504.07960},
177
+ year={2025}
178
+ }
179
+ ```