Text Generation
Transformers
Safetensors
English
qwen3
prompt-engineering
image-generation
z-image
z-image-turbo
text-encoder
comfyui
lm-studio
conversational
text-generation-inference
Instructions to use BennyDaBall/Z-Image-Engineer-V6 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BennyDaBall/Z-Image-Engineer-V6 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="BennyDaBall/Z-Image-Engineer-V6") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("BennyDaBall/Z-Image-Engineer-V6") model = AutoModelForMultimodalLM.from_pretrained("BennyDaBall/Z-Image-Engineer-V6") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use BennyDaBall/Z-Image-Engineer-V6 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "BennyDaBall/Z-Image-Engineer-V6" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BennyDaBall/Z-Image-Engineer-V6", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/BennyDaBall/Z-Image-Engineer-V6
- SGLang
How to use BennyDaBall/Z-Image-Engineer-V6 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "BennyDaBall/Z-Image-Engineer-V6" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BennyDaBall/Z-Image-Engineer-V6", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "BennyDaBall/Z-Image-Engineer-V6" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BennyDaBall/Z-Image-Engineer-V6", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use BennyDaBall/Z-Image-Engineer-V6 with Docker Model Runner:
docker model run hf.co/BennyDaBall/Z-Image-Engineer-V6
| license: apache-2.0 | |
| language: | |
| - en | |
| base_model: | |
| - Tongyi-MAI/Z-Image-Turbo | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| tags: | |
| - text-generation | |
| - prompt-engineering | |
| - image-generation | |
| - z-image | |
| - z-image-turbo | |
| - qwen3 | |
| - text-encoder | |
| - comfyui | |
| - lm-studio | |
| - conversational | |
| # Z-Image-Engineer V6 (4B) | |
| ## Model Metadata | |
| | Key | Value | | |
| |---|---| | |
| | **License** | Apache-2.0 | | |
| | **Language** | English (`en`) | | |
| | **Base Model** | `Tongyi-MAI/Z-Image-Turbo` | | |
| | **Library** | `transformers` | | |
| | **Pipeline Tag** | `text-generation` | | |
| | **Format** | HF Safetensors | | |
| --- | |
| The **Z-Engineer** returns, fully rebuilt around the **SMART DoRA** training system for Z-Image Turbo. | |
| Yes, we jump from V4 to V6. Unlike the usual guy math, this one actually brought the extra two inches. | |
| **Z-Image-Engineer V6** is a fine-tuned 4B Qwen text encoder (`Tongyi-MAI/Z-Image-Turbo`) optimized for dual-role performance: a local prompt-enhancement model for LM Studio, and a merged HF text encoder for Z-Image workflows. | |
|  | |
| --- | |
| ## What is Z-Image-Engineer V6? | |
| V6 transforms minimal seed prompts into rich, highly structured visual narratives. It adds explicit scene composition, lighting direction, material texture, and depth separation while stripping out empty prompt sludge like *"8k, masterpiece, trending on ArtStation."* | |
| It can also be used directly as a Z-Image text encoder. This repo contains the merged HF safetensors. The GGUF quantized release lives in the companion repo: [Z-Image-Engineer-V6-GGUF](https://huggingface.co/BennyDaBall/Z-Image-Engineer-V6-GGUF). | |
| ### Key Use Cases | |
| - **Prompt Enhancement:** Upgrade simple concepts into descriptive, high-fidelity visual prompts locally. | |
| - **Text Encoder Swap:** Replace the stock Z-Image Qwen text encoder to generate different conditioning from the same seed. | |
| - **Hybrid Mode:** Use V6 to rewrite your prompt, then use V6 again to encode it. It writes the scene and drives the image model. | |
| - **Private Local Workflow:** Built for LM Studio, ComfyUI, and `llama.cpp`. No API logs, no external telemetry. | |
| --- | |
| ## Under the Hood: SMART DoRA | |
| V4 pioneered SMART training. V6 adapts that system into a **Weight-Decomposed Low-Rank Adaptation (DoRA)** framework. | |
| DoRA provides surgical adapter updates by decoupling directional and magnitude adjustments. SMART adds auxiliary pressure so the model does not collapse into repetitive prompt loops or superficial sentence patterns. | |
| | Regularizer | What it Does | Why it Matters | | |
| |---|---|---| | |
| | **Entropic** | Broadens output probability diversity. | Reduces repetitive loops and generic vocabulary. | | |
| | **Holographic** | Enforces structured, depth-wise feature logic. | Improves foreground/background hierarchy. | | |
| | **Topological** | Stabilizes coherent latent trajectories. | Keeps prompts flowing naturally instead of stalling out. | | |
| | **Manifold** | Regulates overall weight distributions. | Keeps model behavior stable under high-pressure refinement. | | |
| ### The Refinement Pipeline | |
| V6 was not a simple one-and-done training run. The final architecture is a blended composite: | |
| 1. **Base Pass:** Master-corpus SMART DoRA training on the native Z-Image Turbo text encoder. | |
| 2. **Retention Pass:** Preservation pressure for numbers, color accuracy, text signage, named objects, actions, and spatial tracking. | |
| 3. **SceneClean SFT32:** Supervised refinement to restore the cinematic V4/base-V6 voice. | |
| 4. **AntiRepeat Binary24:** Binary anti-repeat refinement to reduce loops, abrupt fragments, and bad endings. | |
| 5. **Final Blend:** A 25% style-restoration / 75% anti-repeat DoRA adapter blend, balancing vivid descriptions with tighter syntax. | |
| --- | |
| ## Quick Start | |
| ### LM Studio: Prompt Enhancement | |
| Use this merged HF release directly where supported, or download a GGUF quant from [Z-Image-Engineer-V6-GGUF](https://huggingface.co/BennyDaBall/Z-Image-Engineer-V6-GGUF) for LM Studio. No complex system prompt is required. | |
| ```text | |
| Enhance this image prompt for Z-Image Turbo: a unicorn | |
| ``` | |
| The comparison examples were generated from direct LM Studio user requests like this, with no separate system prompt. `V6_SYSTEM_PROMPT.md` is included only as an optional preset for people who want a stricter prompt-only chat setup. | |
| ### ComfyUI: Direct Encoder Swap | |
| 1. Download a GGUF quant from [Z-Image-Engineer-V6-GGUF](https://huggingface.co/BennyDaBall/Z-Image-Engineer-V6-GGUF). | |
| 2. Place the GGUF file into `ComfyUI/models/text_encoders/`. | |
| 3. Add a `CLIPLoaderGGUF` node. | |
| 4. Set model type to `lumina2`. | |
| 5. Use it where the stock Z-Image Qwen text encoder would normally go. | |
| Optional workflow repo: | |
| - [ComfyUI-Z-Engineer](https://github.com/BennyDaBall930/ComfyUI-Z-Engineer) | |
| The raw GGUF works without the node. | |
| ### Verified Image Settings | |
| ```text | |
| UNET: z_image_turbo_bf16.safetensors | |
| VAE: ae.safetensors | |
| Text Encoder: Z-Image-Engineer-V6-Q8_0.gguf from the GGUF repo | |
| Resolution: 1024x1024 | |
| Steps: 8 | |
| CFG: 1.0 | |
| Sampler: res_multistep | |
| Scheduler: simple | |
| Shift: 3.0 | |
| ``` | |
| --- | |
| ## Training Specifics | |
| | Parameter | Specification | | |
| |---|---| | |
| | **Base Text Encoder** | `Tongyi-MAI/Z-Image-Turbo/text_encoder` | | |
| | **Tokenizer** | `Tongyi-MAI/Z-Image-Turbo/tokenizer` | | |
| | **Method** | SMART DoRA / PEFT Adapter Training | | |
| | **Rank / Alpha / Dropout** | 64 / 64 / 0.03 | | |
| | **Target Modules** | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `down_proj`, `up_proj` | | |
| | **Refinement Stack** | Supervised Style SFT + Binary Anti-Repeat | | |
| | **Final Packaging** | Merged HF safetensors | | |
| --- | |
| ## GGUF Quantization Ladder | |
| The quantized release is separate on purpose: | |
| [BennyDaBall/Z-Image-Engineer-V6-GGUF](https://huggingface.co/BennyDaBall/Z-Image-Engineer-V6-GGUF) | |
| That repo contains the full GGUF ladder: F16, Q8_0, Q6_K, Q5_K_M, Q4_K_M, Q3_K_M, and MXFP4. | |
| --- | |
| ## Verification & Proof | |
| The bundled comparison image is: | |
| ```text | |
| evidence/gallery_z_image_engineer_v6_simple_ab_with_rewrites_CONTACT.png | |
| ``` | |
| It compares foundational prompts across four isolated control paths: | |
| 1. Stock Encoder + Raw Prompt | |
| 2. V6 Encoder + Raw Prompt | |
| 3. Stock Encoder + V6 LM Studio Rewrite | |
| 4. V6 Encoder + V6 LM Studio Rewrite | |
| --- | |
| ## Disclaimer & Acknowledgements | |
| This model is a prompt engineer and text encoder. Diffusion is still diffusion; structural expansion improves compositional adherence, but it does not mathematically guarantee a perfect seed every single time. Use creative judgment locally. | |
| - **Tongyi-MAI** for the Z-Image Turbo ecosystem. | |
| - **Qwen** for the adaptable text encoder backbone. | |
| - The open-source maintainers behind **LM Studio**, **ComfyUI**, **llama.cpp**, **PEFT**, and **Transformers**. | |
| - My local power utility provider, for sustaining the research grid. | |
| **Built & trained locally with care by BennyDaBall.** | |