---
license: apache-2.0
base_model:
- stabilityai/stable-diffusion-3-medium
tags:
- Custom-Pipeline
---

# 🌀 DreamCoil-Diffusion-Mini

*Developed as part of the research at **EngineerG Lab**.* 🔬

**DreamCoil-Diffusion-Mini** (`muverqqw/DreamCoil-Diffusion-Mini`) is a highly optimized, lightweight modification of Stable Diffusion 3 Medium.

We completely removed the heavy T5-XXL text encoder and replaced it with the compact **`Qwen3-Embedding-0.6B`**. This dramatically reduces VRAM usage, RAM requirements, and model loading times, while maintaining a strong level of prompt understanding.

This alignment is made possible by a custom-trained neural network — the **DreamCoil Projector** (an MLP that maps Qwen's 1024-dimensional hidden states into SD3's 4096-dimensional latent space). Additionally, this pipeline includes a built-in **Safe VAE Decode** patch to prevent "black square" (NaN) generation errors common in SD3.

### 🌟 Key Features
* **No T5 Required:** Fast loading and low VRAM footprint.
* **Powered by Qwen:** Uses `Qwen3-Embedding-0.6B` as the primary semantic engine.
* **Custom Projector:** Specifically trained to bridge the Qwen language model and the SD3 transformer.
* **NaN-Safe VAE:** The custom pipeline automatically handles VAE NaN outputs, ensuring stable generation.

### ⚠️ IMPORTANT: Always Use Negative Prompts!
Because the `0.6B` language model is significantly smaller than the original `4.7B` T5 encoder, it might occasionally miss fine details or hallucinate. **Using a negative prompt is highly recommended** to strictly guide the model and achieve the best visual results.
## 🚀 Quick Start (Usage)

Because we fundamentally changed the architecture (replacing T5 with Qwen), the standard `diffusers` loading mechanism might throw key mismatch errors. 

To solve this, we provide a custom loading script. This script automatically downloads our custom pipeline logic and uses a helper function (`load_dreamcoil_model`) to correctly initialize the Qwen text encoder and the MLP projector.

---

### Run this script:

```python
import os
import shutil
import sys
from huggingface_hub import hf_hub_download

# --- 1. Settings ---
REPO_ID = "muverqqw/DreamCoil-Diffusion-Mini"
FILENAME = "pipeline.py" 
LOCAL_FILENAME = "dreamcoil_pipeline.py"

# --- 2. Download Custom Architecture ---
print(f"📦 Downloading DreamCoil architecture from {REPO_ID}...")
cached_file = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)

# Copy and rename to avoid conflicts with system modules
shutil.copy(cached_file, LOCAL_FILENAME) 
sys.path.append(os.getcwd())

# Import the custom loader
try:
    from dreamcoil_pipeline import load_dreamcoil_model
    print("✅ Architecture imported successfully.")
except ImportError as e:
    print(f"❌ Import error: {e}")
    if 'dreamcoil_pipeline' in sys.modules:
        import importlib
        importlib.reload(sys.modules['dreamcoil_pipeline'])
        from dreamcoil_pipeline import load_dreamcoil_model

# --- 3. Load the Model ---
print("🚀 Loading weights (this might take a minute)...")
pipe = load_dreamcoil_model(model_id=REPO_ID, device="cuda")

# --- 4. Generation ---
prompt = (
    "A high-quality, realistic photography shot of a young woman with long blonde hair, seen from behind. "
    "She is wearing a light, semi-transparent white summer dress. She stands on a sandy beach, "
    "looking at the beautiful turquoise ocean waves with white sea foam. Bright sunny day, "
    "natural lighting, cinematic composition, 8k resolution, highly detailed skin and fabric textures."
)

# A strong negative prompt is highly recommended for this mini-encoder!
negative_prompt = (
    "deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, "
    "extra limb, missing limb, floating limbs, mutated, ugly, blurry, text, watermark"
)

print("🎨 Generating image...")
image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=28,
    guidance_scale=7.0
).images[0]

# --- 5. Save/Display ---
image.save("dreamcoil_output.png")
print("✅ Image saved as dreamcoil_output.png")
```
---

## 🛠 Training Details

The creation of **DreamCoil-Diffusion-Mini** was conducted in a strict two-stage process:

* **Projector Alignment:** First, we trained the custom `DreamCoilProjector` (MLP) to properly map the 1024-dimensional hidden states of `Qwen3-Embedding-0.6B` into the 4096-dimensional latent space expected by the SD3 Medium Transformer. During this stage, the base model weights were frozen.
* **LoRA Fine-Tuning:** Once the text encoder was aligned, we performed LoRA fine-tuning directly on the model to adapt the visual generation capabilities to the new semantic understanding of the Qwen encoder. 

*All training artifacts and LoRA weights are included in this repository.*

---

## ⚠️ Limitations

* **Complex Prompts:** Because a `0.6B` text encoder replaces the original `4.7B` T5, the model may struggle with highly complex, multi-subject prompts or precise text rendering compared to the base SD3.
* **Prompt Dependency:** The model relies heavily on negative prompts to steer away from artifacts.

---

## ☕ Support the Project

This model was developed as part of the independent research at **EngineerG Lab**. Training custom projectors and fine-tuning requires significant GPU resources. 

If you find this model useful and want to support our future developments, consider buying us a coffee! Every donation helps rent GPUs for the next breakthrough. ❤️

<br>

<a href="https://donatello.to/IceL1ghtning" target="_blank">
  <img src="https://img.shields.io/badge/Support_Me_on-Donatello-FF5722?style=for-the-badge&logo=buy-me-a-coffee&logoColor=white" alt="Donate with Donatello"/>
</a>

---
## 📊 Performance Benchmark (NVIDIA T4 16GB)

We conducted a head-to-head comparison between **DreamCoil-Mini** and the **Original SD3-Medium** on a standard NVIDIA T4 GPU (16GB VRAM) using Kaggle environments. 

| Metric | DreamCoil-Mini 🌀 | Original SD3-Medium | Improvement |
| :--- | :--- | :--- | :--- |
| **Generation Time** | **35.11 s** | 118.92 s | **~3.4x Faster** |
| **Peak VRAM** | **11.53 GB** | 13.66 GB* | **-2.13 GB** |
| **Load Time** | **38.05 s** | 68.84 s | **~1.8x Faster** |
| **Prompt Alignment (CLIP Score)** | 27.37 | **28.81** | -5% difference |

*\*Original SD3 requires CPU offloading to run on a 16GB T4, which significantly slows down generation.*

### 📈 Analysis:
*   **Speed King:** DreamCoil-Mini is **340% faster** than the original model on mid-range hardware because it avoids slow CPU-to-GPU data transfers.
*   **Efficient Semantics:** By replacing the 4.7B T5-XXL with a 0.6B Qwen encoder, we maintained **95% of the prompt following capability** while drastically reducing the model's footprint.
*   **Accessibility:** This model makes SD3-level generation viable for users with older or mid-range GPUs (12GB - 16GB VRAM) without the painful slowness of offloading.