Update README.md

08250a2 verified 20 days ago

6.99 kB

	---
	license: apache-2.0
	base_model:
	- stabilityai/stable-diffusion-3-medium
	tags:
	- Custom-Pipeline
	---

	# 🌀 DreamCoil-Diffusion-Mini

	Developed as part of the research at EngineerG Lab. 🔬

	DreamCoil-Diffusion-Mini (`muverqqw/DreamCoil-Diffusion-Mini`) is a highly optimized, lightweight modification of Stable Diffusion 3 Medium.

	We completely removed the heavy T5-XXL text encoder and replaced it with the compact `Qwen3-Embedding-0.6B`. This dramatically reduces VRAM usage, RAM requirements, and model loading times, while maintaining a strong level of prompt understanding.

	This alignment is made possible by a custom-trained neural network — the DreamCoil Projector (an MLP that maps Qwen's 1024-dimensional hidden states into SD3's 4096-dimensional latent space). Additionally, this pipeline includes a built-in Safe VAE Decode patch to prevent "black square" (NaN) generation errors common in SD3.

	### 🌟 Key Features
	* No T5 Required: Fast loading and low VRAM footprint.
	* Powered by Qwen: Uses `Qwen3-Embedding-0.6B` as the primary semantic engine.
	* Custom Projector: Specifically trained to bridge the Qwen language model and the SD3 transformer.
	* NaN-Safe VAE: The custom pipeline automatically handles VAE NaN outputs, ensuring stable generation.

	### ⚠️ IMPORTANT: Always Use Negative Prompts!
	Because the `0.6B` language model is significantly smaller than the original `4.7B` T5 encoder, it might occasionally miss fine details or hallucinate. Using a negative prompt is highly recommended to strictly guide the model and achieve the best visual results.
	## 🚀 Quick Start (Usage)

	Because we fundamentally changed the architecture (replacing T5 with Qwen), the standard `diffusers` loading mechanism might throw key mismatch errors.

	To solve this, we provide a custom loading script. This script automatically downloads our custom pipeline logic and uses a helper function (`load_dreamcoil_model`) to correctly initialize the Qwen text encoder and the MLP projector.

	---

	### Run this script:

	```python
	import os
	import shutil
	import sys
	from huggingface_hub import hf_hub_download

	# --- 1. Settings ---
	REPO_ID = "muverqqw/DreamCoil-Diffusion-Mini"
	FILENAME = "pipeline.py"
	LOCAL_FILENAME = "dreamcoil_pipeline.py"

	# --- 2. Download Custom Architecture ---
	print(f"📦 Downloading DreamCoil architecture from {REPO_ID}...")
	cached_file = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)

	# Copy and rename to avoid conflicts with system modules
	shutil.copy(cached_file, LOCAL_FILENAME)
	sys.path.append(os.getcwd())

	# Import the custom loader
	try:
	from dreamcoil_pipeline import load_dreamcoil_model
	print("✅ Architecture imported successfully.")
	except ImportError as e:
	print(f"❌ Import error: {e}")
	if 'dreamcoil_pipeline' in sys.modules:
	import importlib
	importlib.reload(sys.modules['dreamcoil_pipeline'])
	from dreamcoil_pipeline import load_dreamcoil_model

	# --- 3. Load the Model ---
	print("🚀 Loading weights (this might take a minute)...")
	pipe = load_dreamcoil_model(model_id=REPO_ID, device="cuda")

	# --- 4. Generation ---
	prompt = (
	"A high-quality, realistic photography shot of a young woman with long blonde hair, seen from behind. "
	"She is wearing a light, semi-transparent white summer dress. She stands on a sandy beach, "
	"looking at the beautiful turquoise ocean waves with white sea foam. Bright sunny day, "
	"natural lighting, cinematic composition, 8k resolution, highly detailed skin and fabric textures."
	)

	# A strong negative prompt is highly recommended for this mini-encoder!
	negative_prompt = (
	"deformed, distorted, disfigured, poorly drawn, bad anatomy, wrong anatomy, "
	"extra limb, missing limb, floating limbs, mutated, ugly, blurry, text, watermark"
	)

	print("🎨 Generating image...")
	image = pipe(
	prompt=prompt,
	negative_prompt=negative_prompt,
	num_inference_steps=28,
	guidance_scale=7.0
	).images[0]

	# --- 5. Save/Display ---
	image.save("dreamcoil_output.png")
	print("✅ Image saved as dreamcoil_output.png")
	```
	---

	## 🛠 Training Details

	The creation of DreamCoil-Diffusion-Mini was conducted in a strict two-stage process:

	* Projector Alignment: First, we trained the custom `DreamCoilProjector` (MLP) to properly map the 1024-dimensional hidden states of `Qwen3-Embedding-0.6B` into the 4096-dimensional latent space expected by the SD3 Medium Transformer. During this stage, the base model weights were frozen.
	* LoRA Fine-Tuning: Once the text encoder was aligned, we performed LoRA fine-tuning directly on the model to adapt the visual generation capabilities to the new semantic understanding of the Qwen encoder.

	All training artifacts and LoRA weights are included in this repository.

	---

	## ⚠️ Limitations

	* Complex Prompts: Because a `0.6B` text encoder replaces the original `4.7B` T5, the model may struggle with highly complex, multi-subject prompts or precise text rendering compared to the base SD3.
	* Prompt Dependency: The model relies heavily on negative prompts to steer away from artifacts.

	---

	## ☕ Support the Project

	This model was developed as part of the independent research at EngineerG Lab. Training custom projectors and fine-tuning requires significant GPU resources.

	If you find this model useful and want to support our future developments, consider buying us a coffee! Every donation helps rent GPUs for the next breakthrough. ❤️

	<br>

	<a href="https://donatello.to/IceL1ghtning" target="_blank">
	<img src="https://img.shields.io/badge/Support_Me_on-Donatello-FF5722?style=for-the-badge&logo=buy-me-a-coffee&logoColor=white" alt="Donate with Donatello"/>
	</a>

	---
	## 📊 Performance Benchmark (NVIDIA T4 16GB)

	We conducted a head-to-head comparison between DreamCoil-Mini and the Original SD3-Medium on a standard NVIDIA T4 GPU (16GB VRAM) using Kaggle environments.

	\| Metric \| DreamCoil-Mini 🌀 \| Original SD3-Medium \| Improvement \|
	\| :--- \| :--- \| :--- \| :--- \|
	\| Generation Time \| 35.11 s \| 118.92 s \| ~3.4x Faster \|
	\| Peak VRAM \| 11.53 GB \| 13.66 GB* \| -2.13 GB \|
	\| Load Time \| 38.05 s \| 68.84 s \| ~1.8x Faster \|
	\| Prompt Alignment (CLIP Score) \| 27.37 \| 28.81 \| -5% difference \|

	\Original SD3 requires CPU offloading to run on a 16GB T4, which significantly slows down generation.*

	### 📈 Analysis:
	* Speed King: DreamCoil-Mini is 340% faster than the original model on mid-range hardware because it avoids slow CPU-to-GPU data transfers.
	* Efficient Semantics: By replacing the 4.7B T5-XXL with a 0.6B Qwen encoder, we maintained 95% of the prompt following capability while drastically reducing the model's footprint.
	* Accessibility: This model makes SD3-level generation viable for users with older or mid-range GPUs (12GB - 16GB VRAM) without the painful slowness of offloading.