VisualCloze
/

VisualClozePipeline-512

@@ -19,13 +19,6 @@ tags:
 # VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning (Implementation with <strong><span style="color:red">Diffusers</span></strong>)
-**Note**: <strong><span style="color:hotpink">You still need to install our modified version of</span></strong> [<strong><span style="color:hotpink">diffusers</span></strong>](https://github.com/lzyhha/diffusers).
-A model trained with the `resolution` of 384 is released at [Model Card](https://huggingface.co/VisualCloze/VisualClozePipeline-384),
-while this model uses the `resolution` of 512. The `resolution` means that each image will be resized to it before being
-concatenated to avoid the out-of-memory error. To generate high-resolution images, we use the [SDEdit](https://arxiv.org/abs/2108.01073) technology for upsampling the generated results.
 <div align="center">
 [[Paper](https://arxiv.org/abs/2504.07960)] &emsp; [[Project Page](https://visualcloze.github.io/)] &emsp; [[Github](https://github.com/lzyhha/VisualCloze)]
@@ -58,7 +51,8 @@ An in-context learning based universal image generation framework.
 ## 🔧 Installation
-Install diffusers from our modified repository.
 ```bash
 git clone https://github.com/lzyhha/diffusers
@@ -70,7 +64,11 @@ pip install -v -e .
 [![Huggingface VisualCloze](https://img.shields.io/static/v1?label=Demo&message=Huggingface%20Gradio&color=orange)](https://huggingface.co/spaces/VisualCloze/VisualCloze)
-Example with Depth-to-Image:
 <img src="./visualcloze_diffusers_example_depthtoimage.jpg" width="60%" height="50%" alt="Example with Depth-to-Image"/>
@@ -125,8 +123,7 @@ image_result = pipe(
 image_result.save("visualcloze.png")
 ```
-Example with Virtual Try-On:
 <img src="./visualcloze_diffusers_example_tryon.jpg" width="60%" height="50%" alt="Example with Virtual Try-On"/>

 # VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning (Implementation with <strong><span style="color:red">Diffusers</span></strong>)
 <div align="center">
 [[Paper](https://arxiv.org/abs/2504.07960)] &emsp; [[Project Page](https://visualcloze.github.io/)] &emsp; [[Github](https://github.com/lzyhha/VisualCloze)]
 ## 🔧 Installation
+<strong><span style="color:hotpink">You still need to install our modified version of</span></strong> [diffusers](https://github.com/lzyhha/diffusers).
 ```bash
 git clone https://github.com/lzyhha/diffusers
 [![Huggingface VisualCloze](https://img.shields.io/static/v1?label=Demo&message=Huggingface%20Gradio&color=orange)](https://huggingface.co/spaces/VisualCloze/VisualCloze)
+A model trained with the `resolution` of 384 is released at [Model Card](https://huggingface.co/VisualCloze/VisualClozePipeline-384),
+while this model uses the `resolution` of 512. The `resolution` means that each image will be resized to it before being
+concatenated to avoid the out-of-memory error. To generate high-resolution images, we use the [SDEdit](https://arxiv.org/abs/2108.01073) technology for upsampling the generated results.
+#### Example with Depth-to-Image:
 <img src="./visualcloze_diffusers_example_depthtoimage.jpg" width="60%" height="50%" alt="Example with Depth-to-Image"/>
 image_result.save("visualcloze.png")
 ```
+#### Example with Virtual Try-On:
 <img src="./visualcloze_diffusers_example_tryon.jpg" width="60%" height="50%" alt="Example with Virtual Try-On"/>