π Z-Engineer V2.5 (4B)
The "Z-Engineer" is back β longer, deeper, and smarter.
This is Z-Engineer V2.5, a specialized 4B parameter model fine-tuned on the Qwen 3 architecture. It serves as a dedicated Creative Director for your image generation workflow, capable of extrapolating complex, cohesive visual narratives from minimal seed concepts. It doesn't just describe a scene; it engineers the light, lens, and atmosphere necessary to render it.
π§ What is this?
Z-Engineer V2.5 is a merged LoRA fine-tuned version of high-performance text encoder from Tongyi-MAI/Z-Image-Turbo. It has been trained to specifically understand the nuances of AI Image Generation (Z-Image-Turbo, Flux2 Klein). It excels at:
- Expanding Concepts: Turn "dog on a bike" into a cinematic narrative.
- Technical Precision: It understands lenses (35mm vs 85mm), lighting (rembrandt, volumetric), and film stocks.
- Stylistic Consistency: It avoids the robotic "AI feel" and writes with a distinct, creative voice.
π Key Use Cases
- β¨ Prompt Enhancement: A lightweight, low-VRAM solution to create, edit, and enrich simple image ideas into detailed narratives.
- π Z-Image Turbo Encoder: Fully backwards compatible as a drop-in CLIP text encoder for Z-Image Turbo workflows, producing varied and unique results from the same seed.
- π‘οΈ Local & Private: Runs entirely on your machine. No API fees, no data logging, no censorship.
- β‘ Hybrid Power: Use it to expand a prompt, then use the model itself as the encoder for the generation stage.
π Key Improvements
- Base Model Upgrade: Switched from standard Qwen3 Instruct to the native text encoder from Z-Image-Turbo for perfect alignment.
- All-Layer Training: Unlike typical lightweight LoRAs, I trained adapters on all 36 layers of the model, ensuring deep behavioral alignment.
- Massive Iteration Count: Trained for 10,000 iterations to fully saturate the weights with the dataset concepts.
π CLIP Model Comparison
Z-Engineer V2.5 can be used as a drop-in CLIP text encoder for Z-Image-Turbo workflows. Here's how it compares to previous versions and the base model:
| Model | Result |
|---|---|
| Z-Engineer V2.5 | β Clean, natural output with excellent detail and coherence. |
| Z-Engineer V2 | β Good quality, but V2.5 shows improved texture and lighting. |
| Z-Engineer V1 | β Broken: Produces severe visual artifacts and distortions. |
| Base Qwen3 4B | β οΈ Functional but generic; lacks the specialized prompt understanding. |
Visual Comparison
Note: V1 exhibits catastrophic artifacts (bottom-left in each grid) due to training instabilities. V2.5 (top-left) consistently produces the cleanest, most natural results.
π ComfyUI Integration (Recommended)
I have released a custom node for seamless integration with ComfyUI!
- Features: Optimized for local OpenAI API compatible backends (LM Studio, Ollama, etc.).
- Get it here: ComfyUI-Z-Engineer
π» Training Facts
I believe in open science. Here is exactly how this was built:
- Hardware: Trained locally on a Mac with 48GB Unified Memory (Apple Silicon).
- Framework: MLX (Apple's native machine learning framework).
- Dataset: Generated locally using Qwen3 VL 30B A3B Instruct
- Size: ~34,678 high-quality examples.
- Content: A curated mix of "Prompt Enhancement" pairs, teaching the model how to take a seed idea and "engineer" it into a final prompt.
- Hyperparameters:
- Iterations: 10,000
- Batch Size: 4
- LoRA Layers: 36 (All Linear Layers)
- Learning Rate: 1e-5
π¦ GGUF & Quantization
I provide a full suite of GGUF quantizations for use with llama.cpp, Ollama, and LM Studio.
| Quantization | Size | Use Case |
|---|---|---|
| Q4_K_S | 2.2 GB | π» Max Compression |
| Q4_K_M | 2.3 GB | β‘οΈ Fast / Mobile / Edge |
| Q5_K_M | 2.7 GB | βοΈ Recommended Balance |
| Q6_K | 3.1 GB | π High Quality |
| Q8_0 | 4.0 GB | π¬ Near-Lossless |
| F16 | 7.5 GB | π§ͺ Reference / Conversion |
β οΈ Disclaimer
This model generates text for image prompts. While I have filtered the dataset, users should use their best judgment. I am not responsible for the content you generate.
- Downloads last month
- 2,428
Model tree for BennyDaBall/Qwen3-4b-Z-Image-Engineer-V2.5
Base model
Tongyi-MAI/Z-Image-Turbo
