muverqqw's picture
Update README.md
08250a2 verified
---
license: apache-2.0
base_model:
- stabilityai/stable-diffusion-3-medium
tags:
- Custom-Pipeline
---
# πŸŒ€ DreamCoil-Diffusion-Mini
*Developed as part of the research at **EngineerG Lab**.* πŸ”¬
**DreamCoil-Diffusion-Mini** (`muverqqw/DreamCoil-Diffusion-Mini`) is a highly optimized, lightweight modification of Stable Diffusion 3 Medium.
We completely removed the heavy T5-XXL text encoder and replaced it with the compact **`Qwen3-Embedding-0.6B`**. This dramatically reduces VRAM usage, RAM requirements, and model loading times, while maintaining a strong level of prompt understanding.
This alignment is made possible by a custom-trained neural network β€” the **DreamCoil Projector** (an MLP that maps Qwen's 1024-dimensional hidden states into SD3's 4096-dimensional latent space). Additionally, this pipeline includes a built-in **Safe VAE Decode** patch to prevent "black square" (NaN) generation errors common in SD3.
### 🌟 Key Features
* **No T5 Required:** Fast loading and low VRAM footprint.
* **Powered by Qwen:** Uses `Qwen3-Embedding-0.6B` as the primary semantic engine.
* **Custom Projector:** Specifically trained to bridge the Qwen language model and the SD3 transformer.
* **NaN-Safe VAE:** The custom pipeline automatically handles VAE NaN outputs, ensuring stable generation.
### ⚠️ IMPORTANT: Always Use Negative Prompts!
Because the `0.6B` language model is significantly smaller than the original `4.7B` T5 encoder, it might occasionally miss fine details or hallucinate. **Using a negative prompt is highly recommended** to strictly guide the model and achieve the best visual results.
## πŸš€ Quick Start (Usage)
Because we fundamentally changed the architecture (replacing T5 with Qwen), the standard `diffusers` loading mechanism might throw key mismatch errors.
To solve this, we provide a custom loading script. This script automatically downloads our custom pipeline logic and uses a helper function (`load_dreamcoil_model`) to correctly initialize the Qwen text encoder and the MLP projector.
---
### Run this script:
```python
import os
import shutil
import sys
from huggingface_hub import hf_hub_download
# --- 1. Settings ---
REPO_ID = "muverqqw/DreamCoil-Diffusion-Mini"
FILENAME = "pipeline.py"
LOCAL_FILENAME = "dreamcoil_pipeline.py"
# --- 2. Download Custom Architecture ---
print(f"πŸ“¦ Downloading DreamCoil architecture from {REPO_ID}...")
cached_file = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)
# Copy and rename to avoid conflicts with system modules
shutil.copy(cached_file, LOCAL_FILENAME)
sys.path.append(os.getcwd())
# Import the custom loader
try:
from dreamcoil_pipeline import load_dreamcoil_model
print("βœ… Architecture imported successfully.")
except ImportError as e:
print(f"❌ Import error: {e}")
if 'dreamcoil_pipeline' in sys.modules:
import importlib
importlib.reload(sys.modules['dreamcoil_pipeline'])
from dreamcoil_pipeline import load_dreamcoil_model
# --- 3. Load the Model ---
print("πŸš€ Loading weights (this might take a minute)...")
pipe = load_dreamcoil_model(model_id=REPO_ID, device="cuda")
# --- 4. Generation ---
prompt = (
"A high-quality, realistic photography shot of a young woman with long blonde hair, seen from behind. "
"She is wearing a light, semi-transparent white summer dress. She stands on a sandy beach, "
"looking at the beautiful turquoise ocean waves with white sea foam. Bright sunny day, "
"natural lighting, cinematic composition, 8k resolution, highly detailed skin and fabric textures."
)
# A strong negative prompt is highly recommended for this mini-encoder!
negative_prompt = (
"deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, "
"extra limb, missing limb, floating limbs, mutated, ugly, blurry, text, watermark"
)
print("🎨 Generating image...")
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=28,
guidance_scale=7.0
).images[0]
# --- 5. Save/Display ---
image.save("dreamcoil_output.png")
print("βœ… Image saved as dreamcoil_output.png")
```
---
## πŸ›  Training Details
The creation of **DreamCoil-Diffusion-Mini** was conducted in a strict two-stage process:
* **Projector Alignment:** First, we trained the custom `DreamCoilProjector` (MLP) to properly map the 1024-dimensional hidden states of `Qwen3-Embedding-0.6B` into the 4096-dimensional latent space expected by the SD3 Medium Transformer. During this stage, the base model weights were frozen.
* **LoRA Fine-Tuning:** Once the text encoder was aligned, we performed LoRA fine-tuning directly on the model to adapt the visual generation capabilities to the new semantic understanding of the Qwen encoder.
*All training artifacts and LoRA weights are included in this repository.*
---
## ⚠️ Limitations
* **Complex Prompts:** Because a `0.6B` text encoder replaces the original `4.7B` T5, the model may struggle with highly complex, multi-subject prompts or precise text rendering compared to the base SD3.
* **Prompt Dependency:** The model relies heavily on negative prompts to steer away from artifacts.
---
## β˜• Support the Project
This model was developed as part of the independent research at **EngineerG Lab**. Training custom projectors and fine-tuning requires significant GPU resources.
If you find this model useful and want to support our future developments, consider buying us a coffee! Every donation helps rent GPUs for the next breakthrough. ❀️
<br>
<a href="https://donatello.to/IceL1ghtning" target="_blank">
<img src="https://img.shields.io/badge/Support_Me_on-Donatello-FF5722?style=for-the-badge&logo=buy-me-a-coffee&logoColor=white" alt="Donate with Donatello"/>
</a>
---
## πŸ“Š Performance Benchmark (NVIDIA T4 16GB)
We conducted a head-to-head comparison between **DreamCoil-Mini** and the **Original SD3-Medium** on a standard NVIDIA T4 GPU (16GB VRAM) using Kaggle environments.
| Metric | DreamCoil-Mini πŸŒ€ | Original SD3-Medium | Improvement |
| :--- | :--- | :--- | :--- |
| **Generation Time** | **35.11 s** | 118.92 s | **~3.4x Faster** |
| **Peak VRAM** | **11.53 GB** | 13.66 GB* | **-2.13 GB** |
| **Load Time** | **38.05 s** | 68.84 s | **~1.8x Faster** |
| **Prompt Alignment (CLIP Score)** | 27.37 | **28.81** | -5% difference |
*\*Original SD3 requires CPU offloading to run on a 16GB T4, which significantly slows down generation.*
### πŸ“ˆ Analysis:
* **Speed King:** DreamCoil-Mini is **340% faster** than the original model on mid-range hardware because it avoids slow CPU-to-GPU data transfers.
* **Efficient Semantics:** By replacing the 4.7B T5-XXL with a 0.6B Qwen encoder, we maintained **95% of the prompt following capability** while drastically reducing the model's footprint.
* **Accessibility:** This model makes SD3-level generation viable for users with older or mid-range GPUs (12GB - 16GB VRAM) without the painful slowness of offloading.