Text Generation
Transformers
Safetensors
English
qwen3
prompt-engineering
image-generation
z-image
z-image-turbo
text-encoder
comfyui
lm-studio
conversational
text-generation-inference
Instructions to use BennyDaBall/Z-Image-Engineer-V6 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BennyDaBall/Z-Image-Engineer-V6 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="BennyDaBall/Z-Image-Engineer-V6") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("BennyDaBall/Z-Image-Engineer-V6") model = AutoModelForMultimodalLM.from_pretrained("BennyDaBall/Z-Image-Engineer-V6") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use BennyDaBall/Z-Image-Engineer-V6 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "BennyDaBall/Z-Image-Engineer-V6" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BennyDaBall/Z-Image-Engineer-V6", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/BennyDaBall/Z-Image-Engineer-V6
- SGLang
How to use BennyDaBall/Z-Image-Engineer-V6 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "BennyDaBall/Z-Image-Engineer-V6" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BennyDaBall/Z-Image-Engineer-V6", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "BennyDaBall/Z-Image-Engineer-V6" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BennyDaBall/Z-Image-Engineer-V6", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use BennyDaBall/Z-Image-Engineer-V6 with Docker Model Runner:
docker model run hf.co/BennyDaBall/Z-Image-Engineer-V6
File size: 6,895 Bytes
c8fcd3f 496b168 c8fcd3f 496b168 c8fcd3f 496b168 c8fcd3f 496b168 c8fcd3f 496b168 c8fcd3f 496b168 c8fcd3f 496b168 c8fcd3f 496b168 c8fcd3f 496b168 c8fcd3f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 | ---
license: apache-2.0
language:
- en
base_model:
- Tongyi-MAI/Z-Image-Turbo
library_name: transformers
pipeline_tag: text-generation
tags:
- text-generation
- prompt-engineering
- image-generation
- z-image
- z-image-turbo
- qwen3
- text-encoder
- comfyui
- lm-studio
- conversational
---
# Z-Image-Engineer V6 (4B)
## Model Metadata
| Key | Value |
|---|---|
| **License** | Apache-2.0 |
| **Language** | English (`en`) |
| **Base Model** | `Tongyi-MAI/Z-Image-Turbo` |
| **Library** | `transformers` |
| **Pipeline Tag** | `text-generation` |
| **Format** | HF Safetensors |
---
The **Z-Engineer** returns, fully rebuilt around the **SMART DoRA** training system for Z-Image Turbo.
Yes, we jump from V4 to V6. Unlike the usual guy math, this one actually brought the extra two inches.
**Z-Image-Engineer V6** is a fine-tuned 4B Qwen text encoder (`Tongyi-MAI/Z-Image-Turbo`) optimized for dual-role performance: a local prompt-enhancement model for LM Studio, and a merged HF text encoder for Z-Image workflows.

---
## What is Z-Image-Engineer V6?
V6 transforms minimal seed prompts into rich, highly structured visual narratives. It adds explicit scene composition, lighting direction, material texture, and depth separation while stripping out empty prompt sludge like *"8k, masterpiece, trending on ArtStation."*
It can also be used directly as a Z-Image text encoder. This repo contains the merged HF safetensors. The GGUF quantized release lives in the companion repo: [Z-Image-Engineer-V6-GGUF](https://huggingface.co/BennyDaBall/Z-Image-Engineer-V6-GGUF).
### Key Use Cases
- **Prompt Enhancement:** Upgrade simple concepts into descriptive, high-fidelity visual prompts locally.
- **Text Encoder Swap:** Replace the stock Z-Image Qwen text encoder to generate different conditioning from the same seed.
- **Hybrid Mode:** Use V6 to rewrite your prompt, then use V6 again to encode it. It writes the scene and drives the image model.
- **Private Local Workflow:** Built for LM Studio, ComfyUI, and `llama.cpp`. No API logs, no external telemetry.
---
## Under the Hood: SMART DoRA
V4 pioneered SMART training. V6 adapts that system into a **Weight-Decomposed Low-Rank Adaptation (DoRA)** framework.
DoRA provides surgical adapter updates by decoupling directional and magnitude adjustments. SMART adds auxiliary pressure so the model does not collapse into repetitive prompt loops or superficial sentence patterns.
| Regularizer | What it Does | Why it Matters |
|---|---|---|
| **Entropic** | Broadens output probability diversity. | Reduces repetitive loops and generic vocabulary. |
| **Holographic** | Enforces structured, depth-wise feature logic. | Improves foreground/background hierarchy. |
| **Topological** | Stabilizes coherent latent trajectories. | Keeps prompts flowing naturally instead of stalling out. |
| **Manifold** | Regulates overall weight distributions. | Keeps model behavior stable under high-pressure refinement. |
### The Refinement Pipeline
V6 was not a simple one-and-done training run. The final architecture is a blended composite:
1. **Base Pass:** Master-corpus SMART DoRA training on the native Z-Image Turbo text encoder.
2. **Retention Pass:** Preservation pressure for numbers, color accuracy, text signage, named objects, actions, and spatial tracking.
3. **SceneClean SFT32:** Supervised refinement to restore the cinematic V4/base-V6 voice.
4. **AntiRepeat Binary24:** Binary anti-repeat refinement to reduce loops, abrupt fragments, and bad endings.
5. **Final Blend:** A 25% style-restoration / 75% anti-repeat DoRA adapter blend, balancing vivid descriptions with tighter syntax.
---
## Quick Start
### LM Studio: Prompt Enhancement
Use this merged HF release directly where supported, or download a GGUF quant from [Z-Image-Engineer-V6-GGUF](https://huggingface.co/BennyDaBall/Z-Image-Engineer-V6-GGUF) for LM Studio. No complex system prompt is required.
```text
Enhance this image prompt for Z-Image Turbo: a unicorn
```
The comparison examples were generated from direct LM Studio user requests like this, with no separate system prompt. `V6_SYSTEM_PROMPT.md` is included only as an optional preset for people who want a stricter prompt-only chat setup.
### ComfyUI: Direct Encoder Swap
1. Download a GGUF quant from [Z-Image-Engineer-V6-GGUF](https://huggingface.co/BennyDaBall/Z-Image-Engineer-V6-GGUF).
2. Place the GGUF file into `ComfyUI/models/text_encoders/`.
3. Add a `CLIPLoaderGGUF` node.
4. Set model type to `lumina2`.
5. Use it where the stock Z-Image Qwen text encoder would normally go.
Optional workflow repo:
- [ComfyUI-Z-Engineer](https://github.com/BennyDaBall930/ComfyUI-Z-Engineer)
The raw GGUF works without the node.
### Verified Image Settings
```text
UNET: z_image_turbo_bf16.safetensors
VAE: ae.safetensors
Text Encoder: Z-Image-Engineer-V6-Q8_0.gguf from the GGUF repo
Resolution: 1024x1024
Steps: 8
CFG: 1.0
Sampler: res_multistep
Scheduler: simple
Shift: 3.0
```
---
## Training Specifics
| Parameter | Specification |
|---|---|
| **Base Text Encoder** | `Tongyi-MAI/Z-Image-Turbo/text_encoder` |
| **Tokenizer** | `Tongyi-MAI/Z-Image-Turbo/tokenizer` |
| **Method** | SMART DoRA / PEFT Adapter Training |
| **Rank / Alpha / Dropout** | 64 / 64 / 0.03 |
| **Target Modules** | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `down_proj`, `up_proj` |
| **Refinement Stack** | Supervised Style SFT + Binary Anti-Repeat |
| **Final Packaging** | Merged HF safetensors |
---
## GGUF Quantization Ladder
The quantized release is separate on purpose:
[BennyDaBall/Z-Image-Engineer-V6-GGUF](https://huggingface.co/BennyDaBall/Z-Image-Engineer-V6-GGUF)
That repo contains the full GGUF ladder: F16, Q8_0, Q6_K, Q5_K_M, Q4_K_M, Q3_K_M, and MXFP4.
---
## Verification & Proof
The bundled comparison image is:
```text
evidence/gallery_z_image_engineer_v6_simple_ab_with_rewrites_CONTACT.png
```
It compares foundational prompts across four isolated control paths:
1. Stock Encoder + Raw Prompt
2. V6 Encoder + Raw Prompt
3. Stock Encoder + V6 LM Studio Rewrite
4. V6 Encoder + V6 LM Studio Rewrite
---
## Disclaimer & Acknowledgements
This model is a prompt engineer and text encoder. Diffusion is still diffusion; structural expansion improves compositional adherence, but it does not mathematically guarantee a perfect seed every single time. Use creative judgment locally.
- **Tongyi-MAI** for the Z-Image Turbo ecosystem.
- **Qwen** for the adaptable text encoder backbone.
- The open-source maintainers behind **LM Studio**, **ComfyUI**, **llama.cpp**, **PEFT**, and **Transformers**.
- My local power utility provider, for sustaining the research grid.
**Built & trained locally with care by BennyDaBall.**
|