lzyhha commited on
Commit
fc9d2e0
·
verified ·
1 Parent(s): 29f969c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -11
README.md CHANGED
@@ -19,12 +19,6 @@ tags:
19
 
20
  # VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning (Implementation with <strong><span style="color:red">Diffusers</span></strong>)
21
 
22
- **Note**: <strong><span style="color:hotpink">You still need to install our modified version of</span></strong> [<strong><span style="color:hotpink">diffusers</span></strong>](https://github.com/lzyhha/diffusers).
23
-
24
- A model trained with the `resolution` of 512 is released at [Model Card](https://huggingface.co/VisualCloze/VisualClozePipeline-512),
25
- while this model uses the `resolution` of 384. The `resolution` means that each image will be resized to it before being
26
- concatenated to avoid the out-of-memory error. To generate high-resolution images, we use the [SDEdit](https://arxiv.org/abs/2108.01073) technology for upsampling the generated results.
27
-
28
  <div align="center">
29
 
30
  [[Paper](https://arxiv.org/abs/2504.07960)] &emsp; [[Project Page](https://visualcloze.github.io/)] &emsp; [[Github](https://github.com/lzyhha/VisualCloze)]
@@ -57,7 +51,8 @@ An in-context learning based universal image generation framework.
57
 
58
  ## 🔧 Installation
59
 
60
- Install diffusers from our modified repository.
 
61
  ```bash
62
  git clone https://github.com/lzyhha/diffusers
63
 
@@ -69,7 +64,11 @@ pip install -v -e .
69
 
70
  [![Huggingface VisualCloze](https://img.shields.io/static/v1?label=Demo&message=Huggingface%20Gradio&color=orange)](https://huggingface.co/spaces/VisualCloze/VisualCloze)
71
 
72
- Example with Depth-to-Image:
 
 
 
 
73
 
74
  <img src="./visualcloze_diffusers_example_depthtoimage.jpg" width="60%" height="50%" alt="Example with Depth-to-Image"/>
75
 
@@ -103,7 +102,7 @@ The woman gazes down at the bouquet with a calm expression. Soft natural lightin
103
  high contrast, photorealistic, intimate, elegant, visually balanced, serene atmosphere."""
104
 
105
  # Load the VisualClozePipeline
106
- pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-384", torch_dtype=torch.bfloat16)
107
  pipe.to("cuda")
108
 
109
  # Run the pipeline
@@ -125,7 +124,7 @@ image_result.save("visualcloze.png")
125
  ```
126
 
127
 
128
- Example with Virtual Try-On:
129
 
130
  <img src="./visualcloze_diffusers_example_tryon.jpg" width="60%" height="50%" alt="Example with Virtual Try-On"/>
131
 
@@ -157,7 +156,7 @@ task_prompt = "Each row shows a virtual try-on process that aims to put [IMAGE2]
157
  content_prompt = None
158
 
159
  # Load the VisualClozePipeline
160
- pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-384", torch_dtype=torch.bfloat16)
161
  pipe.to("cuda")
162
 
163
  # Run the pipeline
 
19
 
20
  # VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning (Implementation with <strong><span style="color:red">Diffusers</span></strong>)
21
 
 
 
 
 
 
 
22
  <div align="center">
23
 
24
  [[Paper](https://arxiv.org/abs/2504.07960)] &emsp; [[Project Page](https://visualcloze.github.io/)] &emsp; [[Github](https://github.com/lzyhha/VisualCloze)]
 
51
 
52
  ## 🔧 Installation
53
 
54
+ <strong><span style="color:hotpink">You still need to install our modified version of</span></strong> [diffusers](https://github.com/lzyhha/diffusers).
55
+
56
  ```bash
57
  git clone https://github.com/lzyhha/diffusers
58
 
 
64
 
65
  [![Huggingface VisualCloze](https://img.shields.io/static/v1?label=Demo&message=Huggingface%20Gradio&color=orange)](https://huggingface.co/spaces/VisualCloze/VisualCloze)
66
 
67
+ A model trained with the `resolution` of 512 is released at [Model Card](https://huggingface.co/VisualCloze/VisualClozePipeline-512),
68
+ while this model uses the `resolution` of 384. The `resolution` means that each image will be resized to it before being
69
+ concatenated to avoid the out-of-memory error. To generate high-resolution images, we use the [SDEdit](https://arxiv.org/abs/2108.01073) technology for upsampling the generated results.
70
+
71
+ #### Example with Depth-to-Image:
72
 
73
  <img src="./visualcloze_diffusers_example_depthtoimage.jpg" width="60%" height="50%" alt="Example with Depth-to-Image"/>
74
 
 
102
  high contrast, photorealistic, intimate, elegant, visually balanced, serene atmosphere."""
103
 
104
  # Load the VisualClozePipeline
105
+ pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-384", resolution=384, torch_dtype=torch.bfloat16)
106
  pipe.to("cuda")
107
 
108
  # Run the pipeline
 
124
  ```
125
 
126
 
127
+ #### Example with Virtual Try-On:
128
 
129
  <img src="./visualcloze_diffusers_example_tryon.jpg" width="60%" height="50%" alt="Example with Virtual Try-On"/>
130
 
 
156
  content_prompt = None
157
 
158
  # Load the VisualClozePipeline
159
+ pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-384", resolution=384, torch_dtype=torch.bfloat16)
160
  pipe.to("cuda")
161
 
162
  # Run the pipeline