DrUM / README.md

[Update] Readme

7f00d30 verified about 2 months ago

4.27 kB

	---
	license: mit
	language:
	- en
	library_name: diffusers
	tags:
	- text-to-image
	- personalization
	- adapter
	- stable-diffusion
	- flux
	- diffusers
	base_model:
	- runwayml/stable-diffusion-v1-5
	- stabilityai/stable-diffusion-2-1
	- stabilityai/stable-diffusion-xl-base-1.0
	- stabilityai/stable-diffusion-3.5-large
	- black-forest-labs/FLUX.1-dev
	pipeline_tag: text-to-image
	---


	# DrUM (Draw Your Mind)

	DrUM enables personalized text-to-image (T2I) generation by integrating reference prompts into T2I diffusion models. It works with foundation T2I models such as Stable Diffusion v1/v2/XL/v3 and FLUX, without requiring additional fine-tuning. DrUM leverages condition-level modeling in the latent space using a transformer-based adapter, and integrates seamlessly with open-source text encoders such as OpenCLIP and Google T5.

	This repository provides the necessary components to run DrUM for inference. For the full source code, training scripts, and detailed documentation, please visit our official [GitHub repository](https://github.com/Burf/DrUM) and read the research paper [[iccv](https://openaccess.thecvf.com/content/ICCV2025/papers/Kim_Draw_Your_Mind_Personalized_Generation_via_Condition-Level_Modeling_in_Text-to-Image_ICCV_2025_paper.pdf)] [[supp](https://openaccess.thecvf.com/content/ICCV2025/supplemental/Kim_Draw_Your_Mind_ICCV_2025_supplemental.pdf)] [[arXiv](https://arxiv.org/abs/2508.03481)].

	<p align="center">
	<img src="teaser.png" width="95%">
	</p>


	## Quickstart

	This model is designed for easy use with the `diffusers` library as a custom pipeline.

	### Installation

	```bash
	pip install torch torchvision diffusers transformers accelerate safetensors huggingface-hub
	```

	### Usage

	```python
	import torch

	from diffusers import DiffusionPipeline
	from pipeline import DrUM

	# Load pipeline and attach DrUM
	#drum = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", custom_pipeline = "Burf/DrUM", pipeline = "runwayml/stable-diffusion-v1-5", torch_dtype = torch.bfloat16, device = "cuda")
	pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype = torch.bfloat16).to("cuda")
	drum = DrUM(pipeline)

	# Generate personalized images
	images = drum(
	prompt = "a photograph of an astronaut riding a horse",
	ref = ["A retro-futuristic space exploration movie poster with bold, vibrant colors"],
	weight = [1.0],
	alpha = 0.3
	)

	images[0].save("personalized_image.png")
	```


	## Supported foundation T2I models

	DrUM works with a wide variety of foundation T2I models that uses text encoders with same weights:

	\| Architecture \| Pipeline \| Text encoder \| DrUM weight \|
	\|--------------\|----------------\|-\|-------------\|
	\| Stable Diffusion v1 \| `runwayml/stable-diffusion-v1-5`, `prompthero/openjourney-v4`,<br>`stablediffusionapi/realistic-vision-v51`,`stablediffusionapi/deliberate-v2`,<br>`stablediffusionapi/anything-v5`, `WarriorMama777/AbyssOrangeMix2`, ... \| `openai/clip-vit-large-patch14` \| `L.safetensors` \|
	\| Stable Diffusion v2 \| `stabilityai/stable-diffusion-2-1`, ... \| `openai/clip-vit-huge-patch14` \| `H.safetensors` \|
	\| Stable Diffusion XL \| `stabilityai/stable-diffusion-xl-base-1.0`, ... \| `openai/clip-vit-large-patch14`,<br>`laion/CLIP-ViT-bigG-14-laion2B-39B-b160k` \| `L.safetensors`,<br>`bigG.safetensors` \|
	\| Stable Diffusion v3 \| `stabilityai/stable-diffusion-3.5-large`<br>`stabilityai/stable-diffusion-3.5-medium`, ... \| `openai/clip-vit-large-patch14`,<br>`laion/CLIP-ViT-bigG-14-laion2B-39B-b160k`,<br>`google/t5-v1_1-xxl` \| `L.safetensors`,<br>`bigG.safetensors`,<br>`T5.safetensors` \|
	\| FLUX \| `black-forest-labs/FLUX.1-dev`, ... \| `openai/clip-vit-large-patch14`,<br>`google/t5-v1_1-xxl` \| `L.safetensors`<br>`T5.safetensors` \|


	## Citation

	```
	@InProceedings{kim2025drum,
	author = {Kim, Hyungjin and Ahn, Seokho and Seo, Young-Duk},
	title = {Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models},
	booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
	month = {October},
	year = {2025},
	pages = {17171-17180}
	}
	```

	## License

	This project is licensed under the MIT License.