VisualCloze
/

VisualClozePipeline-384

@@ -19,12 +19,6 @@ tags:
 # VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning (Implementation with <strong><span style="color:red">Diffusers</span></strong>)
-**Note**: <strong><span style="color:hotpink">You still need to install our modified version of</span></strong> [<strong><span style="color:hotpink">diffusers</span></strong>](https://github.com/lzyhha/diffusers).
-A model trained with the `resolution` of 512 is released at [Model Card](https://huggingface.co/VisualCloze/VisualClozePipeline-512),
-while this model uses the `resolution` of 384. The `resolution` means that each image will be resized to it before being
-concatenated to avoid the out-of-memory error. To generate high-resolution images, we use the [SDEdit](https://arxiv.org/abs/2108.01073) technology for upsampling the generated results.
 <div align="center">
 [[Paper](https://arxiv.org/abs/2504.07960)] &emsp; [[Project Page](https://visualcloze.github.io/)] &emsp; [[Github](https://github.com/lzyhha/VisualCloze)]
@@ -57,7 +51,8 @@ An in-context learning based universal image generation framework.
 ## 🔧 Installation
-Install diffusers from our modified repository.
 ```bash
 git clone https://github.com/lzyhha/diffusers
@@ -69,7 +64,11 @@ pip install -v -e .
 [![Huggingface VisualCloze](https://img.shields.io/static/v1?label=Demo&message=Huggingface%20Gradio&color=orange)](https://huggingface.co/spaces/VisualCloze/VisualCloze)
-Example with Depth-to-Image:
 <img src="./visualcloze_diffusers_example_depthtoimage.jpg" width="60%" height="50%" alt="Example with Depth-to-Image"/>
@@ -103,7 +102,7 @@ The woman gazes down at the bouquet with a calm expression. Soft natural lightin
 high contrast, photorealistic, intimate, elegant, visually balanced, serene atmosphere."""
 # Load the VisualClozePipeline
-pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-384", torch_dtype=torch.bfloat16)
 pipe.to("cuda")
 # Run the pipeline
@@ -125,7 +124,7 @@ image_result.save("visualcloze.png")
 ```
-Example with Virtual Try-On:
 <img src="./visualcloze_diffusers_example_tryon.jpg" width="60%" height="50%" alt="Example with Virtual Try-On"/>
@@ -157,7 +156,7 @@ task_prompt = "Each row shows a virtual try-on process that aims to put [IMAGE2]
 content_prompt = None
 # Load the VisualClozePipeline
-pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-384", torch_dtype=torch.bfloat16)
 pipe.to("cuda")
 # Run the pipeline

 # VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning (Implementation with <strong><span style="color:red">Diffusers</span></strong>)
 <div align="center">
 [[Paper](https://arxiv.org/abs/2504.07960)] &emsp; [[Project Page](https://visualcloze.github.io/)] &emsp; [[Github](https://github.com/lzyhha/VisualCloze)]
 ## 🔧 Installation
+<strong><span style="color:hotpink">You still need to install our modified version of</span></strong> [diffusers](https://github.com/lzyhha/diffusers).
 ```bash
 git clone https://github.com/lzyhha/diffusers
 [![Huggingface VisualCloze](https://img.shields.io/static/v1?label=Demo&message=Huggingface%20Gradio&color=orange)](https://huggingface.co/spaces/VisualCloze/VisualCloze)
+A model trained with the `resolution` of 512 is released at [Model Card](https://huggingface.co/VisualCloze/VisualClozePipeline-512),
+while this model uses the `resolution` of 384. The `resolution` means that each image will be resized to it before being
+concatenated to avoid the out-of-memory error. To generate high-resolution images, we use the [SDEdit](https://arxiv.org/abs/2108.01073) technology for upsampling the generated results.
+#### Example with Depth-to-Image:
 <img src="./visualcloze_diffusers_example_depthtoimage.jpg" width="60%" height="50%" alt="Example with Depth-to-Image"/>
 high contrast, photorealistic, intimate, elegant, visually balanced, serene atmosphere."""
 # Load the VisualClozePipeline
+pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-384", resolution=384, torch_dtype=torch.bfloat16)
 pipe.to("cuda")
 # Run the pipeline
 ```
+#### Example with Virtual Try-On:
 <img src="./visualcloze_diffusers_example_tryon.jpg" width="60%" height="50%" alt="Example with Virtual Try-On"/>
 content_prompt = None
 # Load the VisualClozePipeline
+pipe = VisualClozePipeline.from_pretrained("VisualCloze/VisualClozePipeline-384", resolution=384, torch_dtype=torch.bfloat16)
 pipe.to("cuda")
 # Run the pipeline