Burf
/

DrUM

@@ -1,95 +1,97 @@
----
-license: mit
-language:
-- en
-library_name: diffusers
-tags:
-- text-to-image
-- personalization
-- adapter
-- stable-diffusion
-- flux
-- diffusers
-base_model:
-- runwayml/stable-diffusion-v1-5
-- stabilityai/stable-diffusion-2-1
-- stabilityai/stable-diffusion-xl-base-1.0
-- stabilityai/stable-diffusion-3.5-large
-- black-forest-labs/FLUX.1-dev
-pipeline_tag: text-to-image
----
-# DrUM (**D**raw **You**r **M**ind)
-**DrUM** enables **personalized text-to-image (T2I) generation by integrating reference prompts** into T2I diffusion models. It works with **foundation T2I models such as Stable Diffusion v1/v2/XL/v3 and FLUX**, without requiring additional fine-tuning. DrUM leverages **condition-level modeling in the latent space using a transformer-based adapter**, and integrates seamlessly with **open-source text encoders such as OpenCLIP and Google T5**.
-This repository provides the necessary components to run DrUM for **inference**. For the full source code, training scripts, and detailed documentation, please visit our official **[GitHub repository](https://github.com/Burf/DrUM)** and read the **[research paper](https://arxiv.org/abs/2508.03481)**.
-<p align="center">
-    <img src="teaser.png" width="95%">
-</p>
-## Quickstart
-This model is designed for easy use with the `diffusers` library as a custom pipeline.
-### Installation
-```bash
-pip install torch torchvision diffusers transformers accelerate safetensors huggingface-hub
-```
-### Usage
-```python
-import torch
-from diffusers import DiffusionPipeline
-from pipeline import DrUM
-# Load pipeline and attach DrUM
-#drum = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", custom_pipeline = "Burf/DrUM", pipeline = "runwayml/stable-diffusion-v1-5", torch_dtype = torch.bfloat16, device = "cuda")
-pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.bfloat16).to("cuda")
-drum = DrUM(pipeline)
-# Generate personalized images
-images = drum(
-    prompt = "a photograph of an astronaut riding a horse",
-    ref = ["A retro-futuristic space exploration movie poster with bold, vibrant colors"],
-    weight = [1.0],
-    alpha = 0.3
-)
-images[0].save("personalized_image.png")
-```
-## Supported foundation T2I models
-DrUM works with a wide variety of foundation T2I models that uses text encoders with same weights:
-| Architecture | Pipeline | Text encoder | DrUM weight |
-|--------------|----------------|-|-------------|
-| Stable Diffusion v1 | `runwayml/stable-diffusion-v1-5`, `prompthero/openjourney-v4`,<br>`stablediffusionapi/realistic-vision-v51`,`stablediffusionapi/deliberate-v2`,<br>`stablediffusionapi/anything-v5`, `WarriorMama777/AbyssOrangeMix2`, ... | `openai/clip-vit-large-patch14` | `L.safetensors` |
-| Stable Diffusion v2 | `stabilityai/stable-diffusion-2-1`, ... | `openai/clip-vit-huge-patch14` | `H.safetensors` |
-| Stable Diffusion XL | `stabilityai/stable-diffusion-xl-base-1.0`, ... | `openai/clip-vit-large-patch14`,<br>`laion/CLIP-ViT-bigG-14-laion2B-39B-b160k` | `L.safetensors`,<br>`bigG.safetensors` |
-| Stable Diffusion v3 | `stabilityai/stable-diffusion-3.5-large`<br>`stabilityai/stable-diffusion-3.5-medium`, ... | `openai/clip-vit-large-patch14`,<br>`laion/CLIP-ViT-bigG-14-laion2B-39B-b160k`,<br>`google/t5-v1_1-xxl` | `L.safetensors`,<br>`bigG.safetensors`,<br>`T5.safetensors` |
-| FLUX | `black-forest-labs/FLUX.1-dev`, ... | `openai/clip-vit-large-patch14`,<br>`google/t5-v1_1-xxl` | `L.safetensors`<br>`T5.safetensors` |
-## Citation
-```
-@inproceedings{kim2025drum,
-	title={Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models},
-	author={Hyungjin Kim, Seokho Ahn, and Young-Duk Seo},
-	booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
-	year={2025}
-}
-```
-## License
 This project is licensed under the MIT License.

+---
+license: mit
+language:
+- en
+library_name: diffusers
+tags:
+- text-to-image
+- personalization
+- adapter
+- stable-diffusion
+- flux
+- diffusers
+base_model:
+- runwayml/stable-diffusion-v1-5
+- stabilityai/stable-diffusion-2-1
+- stabilityai/stable-diffusion-xl-base-1.0
+- stabilityai/stable-diffusion-3.5-large
+- black-forest-labs/FLUX.1-dev
+pipeline_tag: text-to-image
+---
+# DrUM (**D**raw **You**r **M**ind)
+**DrUM** enables **personalized text-to-image (T2I) generation by integrating reference prompts** into T2I diffusion models. It works with **foundation T2I models such as Stable Diffusion v1/v2/XL/v3 and FLUX**, without requiring additional fine-tuning. DrUM leverages **condition-level modeling in the latent space using a transformer-based adapter**, and integrates seamlessly with **open-source text encoders such as OpenCLIP and Google T5**.
+This repository provides the necessary components to run DrUM for **inference**. For the full source code, training scripts, and detailed documentation, please visit our official **[GitHub repository](https://github.com/Burf/DrUM)** and read the **[research paper](https://openaccess.thecvf.com/content/ICCV2025/papers/Kim_Draw_Your_Mind_Personalized_Generation_via_Condition-Level_Modeling_in_Text-to-Image_ICCV_2025_paper.pdf)**.
+<p align="center">
+    <img src="teaser.png" width="95%">
+</p>
+## Quickstart
+This model is designed for easy use with the `diffusers` library as a custom pipeline.
+### Installation
+```bash
+pip install torch torchvision diffusers transformers accelerate safetensors huggingface-hub
+```
+### Usage
+```python
+import torch
+from diffusers import DiffusionPipeline
+from pipeline import DrUM
+# Load pipeline and attach DrUM
+#drum = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", custom_pipeline = "Burf/DrUM", pipeline = "runwayml/stable-diffusion-v1-5", torch_dtype = torch.bfloat16, device = "cuda")
+pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.bfloat16).to("cuda")
+drum = DrUM(pipeline)
+# Generate personalized images
+images = drum(
+    prompt = "a photograph of an astronaut riding a horse",
+    ref = ["A retro-futuristic space exploration movie poster with bold, vibrant colors"],
+    weight = [1.0],
+    alpha = 0.3
+)
+images[0].save("personalized_image.png")
+```
+## Supported foundation T2I models
+DrUM works with a wide variety of foundation T2I models that uses text encoders with same weights:
+| Architecture | Pipeline | Text encoder | DrUM weight |
+|--------------|----------------|-|-------------|
+| Stable Diffusion v1 | `runwayml/stable-diffusion-v1-5`, `prompthero/openjourney-v4`,<br>`stablediffusionapi/realistic-vision-v51`,`stablediffusionapi/deliberate-v2`,<br>`stablediffusionapi/anything-v5`, `WarriorMama777/AbyssOrangeMix2`, ... | `openai/clip-vit-large-patch14` | `L.safetensors` |
+| Stable Diffusion v2 | `stabilityai/stable-diffusion-2-1`, ... | `openai/clip-vit-huge-patch14` | `H.safetensors` |
+| Stable Diffusion XL | `stabilityai/stable-diffusion-xl-base-1.0`, ... | `openai/clip-vit-large-patch14`,<br>`laion/CLIP-ViT-bigG-14-laion2B-39B-b160k` | `L.safetensors`,<br>`bigG.safetensors` |
+| Stable Diffusion v3 | `stabilityai/stable-diffusion-3.5-large`<br>`stabilityai/stable-diffusion-3.5-medium`, ... | `openai/clip-vit-large-patch14`,<br>`laion/CLIP-ViT-bigG-14-laion2B-39B-b160k`,<br>`google/t5-v1_1-xxl` | `L.safetensors`,<br>`bigG.safetensors`,<br>`T5.safetensors` |
+| FLUX | `black-forest-labs/FLUX.1-dev`, ... | `openai/clip-vit-large-patch14`,<br>`google/t5-v1_1-xxl` | `L.safetensors`<br>`T5.safetensors` |
+## Citation
+```
+@InProceedings{kim2025drum,
+    author    = {Kim, Hyungjin and Ahn, Seokho and Seo, Young-Duk},
+    title     = {Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models},
+    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
+    month     = {October},
+    year      = {2025},
+    pages     = {17171-17180}
+}
+```
+## License
 This project is licensed under the MIT License.