Instructions to use muverqqw/DreamCoil-Diffusion-Mini with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use muverqqw/DreamCoil-Diffusion-Mini with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("muverqqw/DreamCoil-Diffusion-Mini", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
| license: apache-2.0 | |
| base_model: | |
| - stabilityai/stable-diffusion-3-medium | |
| tags: | |
| - Custom-Pipeline | |
| # π DreamCoil-Diffusion-Mini | |
| *Developed as part of the research at **EngineerG Lab**.* π¬ | |
| **DreamCoil-Diffusion-Mini** (`muverqqw/DreamCoil-Diffusion-Mini`) is a highly optimized, lightweight modification of Stable Diffusion 3 Medium. | |
| We completely removed the heavy T5-XXL text encoder and replaced it with the compact **`Qwen3-Embedding-0.6B`**. This dramatically reduces VRAM usage, RAM requirements, and model loading times, while maintaining a strong level of prompt understanding. | |
| This alignment is made possible by a custom-trained neural network β the **DreamCoil Projector** (an MLP that maps Qwen's 1024-dimensional hidden states into SD3's 4096-dimensional latent space). Additionally, this pipeline includes a built-in **Safe VAE Decode** patch to prevent "black square" (NaN) generation errors common in SD3. | |
| ### π Key Features | |
| * **No T5 Required:** Fast loading and low VRAM footprint. | |
| * **Powered by Qwen:** Uses `Qwen3-Embedding-0.6B` as the primary semantic engine. | |
| * **Custom Projector:** Specifically trained to bridge the Qwen language model and the SD3 transformer. | |
| * **NaN-Safe VAE:** The custom pipeline automatically handles VAE NaN outputs, ensuring stable generation. | |
| ### β οΈ IMPORTANT: Always Use Negative Prompts! | |
| Because the `0.6B` language model is significantly smaller than the original `4.7B` T5 encoder, it might occasionally miss fine details or hallucinate. **Using a negative prompt is highly recommended** to strictly guide the model and achieve the best visual results. | |
| ## π Quick Start (Usage) | |
| Because we fundamentally changed the architecture (replacing T5 with Qwen), the standard `diffusers` loading mechanism might throw key mismatch errors. | |
| To solve this, we provide a custom loading script. This script automatically downloads our custom pipeline logic and uses a helper function (`load_dreamcoil_model`) to correctly initialize the Qwen text encoder and the MLP projector. | |
| --- | |
| ### Run this script: | |
| ```python | |
| import os | |
| import shutil | |
| import sys | |
| from huggingface_hub import hf_hub_download | |
| # --- 1. Settings --- | |
| REPO_ID = "muverqqw/DreamCoil-Diffusion-Mini" | |
| FILENAME = "pipeline.py" | |
| LOCAL_FILENAME = "dreamcoil_pipeline.py" | |
| # --- 2. Download Custom Architecture --- | |
| print(f"π¦ Downloading DreamCoil architecture from {REPO_ID}...") | |
| cached_file = hf_hub_download(repo_id=REPO_ID, filename=FILENAME) | |
| # Copy and rename to avoid conflicts with system modules | |
| shutil.copy(cached_file, LOCAL_FILENAME) | |
| sys.path.append(os.getcwd()) | |
| # Import the custom loader | |
| try: | |
| from dreamcoil_pipeline import load_dreamcoil_model | |
| print("β Architecture imported successfully.") | |
| except ImportError as e: | |
| print(f"β Import error: {e}") | |
| if 'dreamcoil_pipeline' in sys.modules: | |
| import importlib | |
| importlib.reload(sys.modules['dreamcoil_pipeline']) | |
| from dreamcoil_pipeline import load_dreamcoil_model | |
| # --- 3. Load the Model --- | |
| print("π Loading weights (this might take a minute)...") | |
| pipe = load_dreamcoil_model(model_id=REPO_ID, device="cuda") | |
| # --- 4. Generation --- | |
| prompt = ( | |
| "A high-quality, realistic photography shot of a young woman with long blonde hair, seen from behind. " | |
| "She is wearing a light, semi-transparent white summer dress. She stands on a sandy beach, " | |
| "looking at the beautiful turquoise ocean waves with white sea foam. Bright sunny day, " | |
| "natural lighting, cinematic composition, 8k resolution, highly detailed skin and fabric textures." | |
| ) | |
| # A strong negative prompt is highly recommended for this mini-encoder! | |
| negative_prompt = ( | |
| "deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, " | |
| "extra limb, missing limb, floating limbs, mutated, ugly, blurry, text, watermark" | |
| ) | |
| print("π¨ Generating image...") | |
| image = pipe( | |
| prompt=prompt, | |
| negative_prompt=negative_prompt, | |
| num_inference_steps=28, | |
| guidance_scale=7.0 | |
| ).images[0] | |
| # --- 5. Save/Display --- | |
| image.save("dreamcoil_output.png") | |
| print("β Image saved as dreamcoil_output.png") | |
| ``` | |
| --- | |
| ## π Training Details | |
| The creation of **DreamCoil-Diffusion-Mini** was conducted in a strict two-stage process: | |
| * **Projector Alignment:** First, we trained the custom `DreamCoilProjector` (MLP) to properly map the 1024-dimensional hidden states of `Qwen3-Embedding-0.6B` into the 4096-dimensional latent space expected by the SD3 Medium Transformer. During this stage, the base model weights were frozen. | |
| * **LoRA Fine-Tuning:** Once the text encoder was aligned, we performed LoRA fine-tuning directly on the model to adapt the visual generation capabilities to the new semantic understanding of the Qwen encoder. | |
| *All training artifacts and LoRA weights are included in this repository.* | |
| --- | |
| ## β οΈ Limitations | |
| * **Complex Prompts:** Because a `0.6B` text encoder replaces the original `4.7B` T5, the model may struggle with highly complex, multi-subject prompts or precise text rendering compared to the base SD3. | |
| * **Prompt Dependency:** The model relies heavily on negative prompts to steer away from artifacts. | |
| --- | |
| ## β Support the Project | |
| This model was developed as part of the independent research at **EngineerG Lab**. Training custom projectors and fine-tuning requires significant GPU resources. | |
| If you find this model useful and want to support our future developments, consider buying us a coffee! Every donation helps rent GPUs for the next breakthrough. β€οΈ | |
| <br> | |
| <a href="https://donatello.to/IceL1ghtning" target="_blank"> | |
| <img src="https://img.shields.io/badge/Support_Me_on-Donatello-FF5722?style=for-the-badge&logo=buy-me-a-coffee&logoColor=white" alt="Donate with Donatello"/> | |
| </a> | |
| --- | |
| ## π Performance Benchmark (NVIDIA T4 16GB) | |
| We conducted a head-to-head comparison between **DreamCoil-Mini** and the **Original SD3-Medium** on a standard NVIDIA T4 GPU (16GB VRAM) using Kaggle environments. | |
| | Metric | DreamCoil-Mini π | Original SD3-Medium | Improvement | | |
| | :--- | :--- | :--- | :--- | | |
| | **Generation Time** | **35.11 s** | 118.92 s | **~3.4x Faster** | | |
| | **Peak VRAM** | **11.53 GB** | 13.66 GB* | **-2.13 GB** | | |
| | **Load Time** | **38.05 s** | 68.84 s | **~1.8x Faster** | | |
| | **Prompt Alignment (CLIP Score)** | 27.37 | **28.81** | -5% difference | | |
| *\*Original SD3 requires CPU offloading to run on a 16GB T4, which significantly slows down generation.* | |
| ### π Analysis: | |
| * **Speed King:** DreamCoil-Mini is **340% faster** than the original model on mid-range hardware because it avoids slow CPU-to-GPU data transfers. | |
| * **Efficient Semantics:** By replacing the 4.7B T5-XXL with a 0.6B Qwen encoder, we maintained **95% of the prompt following capability** while drastically reducing the model's footprint. | |
| * **Accessibility:** This model makes SD3-level generation viable for users with older or mid-range GPUs (12GB - 16GB VRAM) without the painful slowness of offloading. |