|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
library_name: diffusers |
|
|
tags: |
|
|
- text-to-image |
|
|
- personalization |
|
|
- adapter |
|
|
- stable-diffusion |
|
|
- flux |
|
|
- diffusers |
|
|
base_model: |
|
|
- runwayml/stable-diffusion-v1-5 |
|
|
- stabilityai/stable-diffusion-2-1 |
|
|
- stabilityai/stable-diffusion-xl-base-1.0 |
|
|
- stabilityai/stable-diffusion-3.5-large |
|
|
- black-forest-labs/FLUX.1-dev |
|
|
pipeline_tag: text-to-image |
|
|
--- |
|
|
|
|
|
|
|
|
# DrUM (**D**raw **You**r **M**ind) |
|
|
|
|
|
**DrUM** enables **personalized text-to-image (T2I) generation by integrating reference prompts** into T2I diffusion models. It works with **foundation T2I models such as Stable Diffusion v1/v2/XL/v3 and FLUX**, without requiring additional fine-tuning. DrUM leverages **condition-level modeling in the latent space using a transformer-based adapter**, and integrates seamlessly with **open-source text encoders such as OpenCLIP and Google T5**. |
|
|
|
|
|
This repository provides the necessary components to run DrUM for **inference**. For the full source code, training scripts, and detailed documentation, please visit our official **[GitHub repository](https://github.com/Burf/DrUM)** and read the **research paper [[iccv](https://openaccess.thecvf.com/content/ICCV2025/papers/Kim_Draw_Your_Mind_Personalized_Generation_via_Condition-Level_Modeling_in_Text-to-Image_ICCV_2025_paper.pdf)] [[supp](https://openaccess.thecvf.com/content/ICCV2025/supplemental/Kim_Draw_Your_Mind_ICCV_2025_supplemental.pdf)] [[arXiv](https://arxiv.org/abs/2508.03481)]**. |
|
|
|
|
|
<p align="center"> |
|
|
<img src="teaser.png" width="95%"> |
|
|
</p> |
|
|
|
|
|
|
|
|
## Quickstart |
|
|
|
|
|
This model is designed for easy use with the `diffusers` library as a custom pipeline. |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install torch torchvision diffusers transformers accelerate safetensors huggingface-hub |
|
|
``` |
|
|
|
|
|
### Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
|
|
|
from diffusers import DiffusionPipeline |
|
|
from pipeline import DrUM |
|
|
|
|
|
# Load pipeline and attach DrUM |
|
|
#drum = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", custom_pipeline = "Burf/DrUM", pipeline = "runwayml/stable-diffusion-v1-5", torch_dtype = torch.bfloat16, device = "cuda") |
|
|
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.bfloat16).to("cuda") |
|
|
drum = DrUM(pipeline) |
|
|
|
|
|
# Generate personalized images |
|
|
images = drum( |
|
|
prompt = "a photograph of an astronaut riding a horse", |
|
|
ref = ["A retro-futuristic space exploration movie poster with bold, vibrant colors"], |
|
|
weight = [1.0], |
|
|
alpha = 0.3 |
|
|
) |
|
|
|
|
|
images[0].save("personalized_image.png") |
|
|
``` |
|
|
|
|
|
|
|
|
## Supported foundation T2I models |
|
|
|
|
|
DrUM works with a wide variety of foundation T2I models that uses text encoders with same weights: |
|
|
|
|
|
| Architecture | Pipeline | Text encoder | DrUM weight | |
|
|
|--------------|----------------|-|-------------| |
|
|
| Stable Diffusion v1 | `runwayml/stable-diffusion-v1-5`, `prompthero/openjourney-v4`,<br>`stablediffusionapi/realistic-vision-v51`,`stablediffusionapi/deliberate-v2`,<br>`stablediffusionapi/anything-v5`, `WarriorMama777/AbyssOrangeMix2`, ... | `openai/clip-vit-large-patch14` | `L.safetensors` | |
|
|
| Stable Diffusion v2 | `stabilityai/stable-diffusion-2-1`, ... | `openai/clip-vit-huge-patch14` | `H.safetensors` | |
|
|
| Stable Diffusion XL | `stabilityai/stable-diffusion-xl-base-1.0`, ... | `openai/clip-vit-large-patch14`,<br>`laion/CLIP-ViT-bigG-14-laion2B-39B-b160k` | `L.safetensors`,<br>`bigG.safetensors` | |
|
|
| Stable Diffusion v3 | `stabilityai/stable-diffusion-3.5-large`<br>`stabilityai/stable-diffusion-3.5-medium`, ... | `openai/clip-vit-large-patch14`,<br>`laion/CLIP-ViT-bigG-14-laion2B-39B-b160k`,<br>`google/t5-v1_1-xxl` | `L.safetensors`,<br>`bigG.safetensors`,<br>`T5.safetensors` | |
|
|
| FLUX | `black-forest-labs/FLUX.1-dev`, ... | `openai/clip-vit-large-patch14`,<br>`google/t5-v1_1-xxl` | `L.safetensors`<br>`T5.safetensors` | |
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
``` |
|
|
@InProceedings{kim2025drum, |
|
|
author = {Kim, Hyungjin and Ahn, Seokho and Seo, Young-Duk}, |
|
|
title = {Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models}, |
|
|
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, |
|
|
month = {October}, |
|
|
year = {2025}, |
|
|
pages = {17171-17180} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This project is licensed under the MIT License. |