πŸŒ€ DreamCoil-Diffusion-Mini

Developed as part of the research at EngineerG Lab. πŸ”¬

DreamCoil-Diffusion-Mini (muverqqw/DreamCoil-Diffusion-Mini) is a highly optimized, lightweight modification of Stable Diffusion 3 Medium.

We completely removed the heavy T5-XXL text encoder and replaced it with the compact Qwen3-Embedding-0.6B. This dramatically reduces VRAM usage, RAM requirements, and model loading times, while maintaining a strong level of prompt understanding.

This alignment is made possible by a custom-trained neural network β€” the DreamCoil Projector (an MLP that maps Qwen's 1024-dimensional hidden states into SD3's 4096-dimensional latent space). Additionally, this pipeline includes a built-in Safe VAE Decode patch to prevent "black square" (NaN) generation errors common in SD3.

🌟 Key Features

  • No T5 Required: Fast loading and low VRAM footprint.
  • Powered by Qwen: Uses Qwen3-Embedding-0.6B as the primary semantic engine.
  • Custom Projector: Specifically trained to bridge the Qwen language model and the SD3 transformer.
  • NaN-Safe VAE: The custom pipeline automatically handles VAE NaN outputs, ensuring stable generation.

⚠️ IMPORTANT: Always Use Negative Prompts!

Because the 0.6B language model is significantly smaller than the original 4.7B T5 encoder, it might occasionally miss fine details or hallucinate. Using a negative prompt is highly recommended to strictly guide the model and achieve the best visual results.

πŸš€ Quick Start (Usage)

Because we fundamentally changed the architecture (replacing T5 with Qwen), the standard diffusers loading mechanism might throw key mismatch errors.

To solve this, we provide a custom loading script. This script automatically downloads our custom pipeline logic and uses a helper function (load_dreamcoil_model) to correctly initialize the Qwen text encoder and the MLP projector.


Run this script:

import os
import shutil
import sys
from huggingface_hub import hf_hub_download

# --- 1. Settings ---
REPO_ID = "muverqqw/DreamCoil-Diffusion-Mini"
FILENAME = "pipeline.py" 
LOCAL_FILENAME = "dreamcoil_pipeline.py"

# --- 2. Download Custom Architecture ---
print(f"πŸ“¦ Downloading DreamCoil architecture from {REPO_ID}...")
cached_file = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)

# Copy and rename to avoid conflicts with system modules
shutil.copy(cached_file, LOCAL_FILENAME) 
sys.path.append(os.getcwd())

# Import the custom loader
try:
    from dreamcoil_pipeline import load_dreamcoil_model
    print("βœ… Architecture imported successfully.")
except ImportError as e:
    print(f"❌ Import error: {e}")
    if 'dreamcoil_pipeline' in sys.modules:
        import importlib
        importlib.reload(sys.modules['dreamcoil_pipeline'])
        from dreamcoil_pipeline import load_dreamcoil_model

# --- 3. Load the Model ---
print("πŸš€ Loading weights (this might take a minute)...")
pipe = load_dreamcoil_model(model_id=REPO_ID, device="cuda")

# --- 4. Generation ---
prompt = (
    "A high-quality, realistic photography shot of a young woman with long blonde hair, seen from behind. "
    "She is wearing a light, semi-transparent white summer dress. She stands on a sandy beach, "
    "looking at the beautiful turquoise ocean waves with white sea foam. Bright sunny day, "
    "natural lighting, cinematic composition, 8k resolution, highly detailed skin and fabric textures."
)

# A strong negative prompt is highly recommended for this mini-encoder!
negative_prompt = (
    "deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, "
    "extra limb, missing limb, floating limbs, mutated, ugly, blurry, text, watermark"
)

print("🎨 Generating image...")
image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=28,
    guidance_scale=7.0
).images[0]

# --- 5. Save/Display ---
image.save("dreamcoil_output.png")
print("βœ… Image saved as dreamcoil_output.png")

πŸ›  Training Details

The creation of DreamCoil-Diffusion-Mini was conducted in a strict two-stage process:

  • Projector Alignment: First, we trained the custom DreamCoilProjector (MLP) to properly map the 1024-dimensional hidden states of Qwen3-Embedding-0.6B into the 4096-dimensional latent space expected by the SD3 Medium Transformer. During this stage, the base model weights were frozen.
  • LoRA Fine-Tuning: Once the text encoder was aligned, we performed LoRA fine-tuning directly on the model to adapt the visual generation capabilities to the new semantic understanding of the Qwen encoder.

All training artifacts and LoRA weights are included in this repository.


⚠️ Limitations

  • Complex Prompts: Because a 0.6B text encoder replaces the original 4.7B T5, the model may struggle with highly complex, multi-subject prompts or precise text rendering compared to the base SD3.
  • Prompt Dependency: The model relies heavily on negative prompts to steer away from artifacts.

β˜• Support the Project

This model was developed as part of the independent research at EngineerG Lab. Training custom projectors and fine-tuning requires significant GPU resources.

If you find this model useful and want to support our future developments, consider buying us a coffee! Every donation helps rent GPUs for the next breakthrough. ❀️


Donate with Donatello

πŸ“Š Performance Benchmark (NVIDIA T4 16GB)

We conducted a head-to-head comparison between DreamCoil-Mini and the Original SD3-Medium on a standard NVIDIA T4 GPU (16GB VRAM) using Kaggle environments.

Metric DreamCoil-Mini πŸŒ€ Original SD3-Medium Improvement
Generation Time 35.11 s 118.92 s ~3.4x Faster
Peak VRAM 11.53 GB 13.66 GB* -2.13 GB
Load Time 38.05 s 68.84 s ~1.8x Faster
Prompt Alignment (CLIP Score) 27.37 28.81 -5% difference

*Original SD3 requires CPU offloading to run on a 16GB T4, which significantly slows down generation.

πŸ“ˆ Analysis:

  • Speed King: DreamCoil-Mini is 340% faster than the original model on mid-range hardware because it avoids slow CPU-to-GPU data transfers.
  • Efficient Semantics: By replacing the 4.7B T5-XXL with a 0.6B Qwen encoder, we maintained 95% of the prompt following capability while drastically reducing the model's footprint.
  • Accessibility: This model makes SD3-level generation viable for users with older or mid-range GPUs (12GB - 16GB VRAM) without the painful slowness of offloading.
Downloads last month
34
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for muverqqw/DreamCoil-Diffusion-Mini

Finetuned
(8)
this model

Collection including muverqqw/DreamCoil-Diffusion-Mini