Instructions to use BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4", dtype="auto")

llama-cpp-python

How to use BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4",
	filename="Qwen3-4b-Z-Image-Engineer-V4-F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M

Use Docker

docker model run hf.co/BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M

LM Studio
Jan

vLLM

How to use BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M

SGLang

How to use BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 with Ollama:
```
ollama run hf.co/BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M
```

Unsloth Studio

How to use BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 to start chatting

How to use BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 with Docker Model Runner:
```
docker model run hf.co/BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M
```

Lemonade

How to use BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4:Q4_K_M

Run and chat with the model

lemonade run user.Qwen3-4b-Z-Image-Engineer-V4-Q4_K_M

List all available models

lemonade list

Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

🚀 Z-Engineer V4 (4B)

Follow me on X @BennyDaBall_OG !

UPDATE: 6/6/26 Z-Image-Engineer V6 is live.

V4 still rocks as the original Qwen3 prompt-engineer build. The new V6 release is the native Z-Image Turbo text-encoder build, rebuilt with SMART DoRA and trained locally for ComfyUI drop-in use plus LM Studio prompt enhancement.

Main V6 repo (HF safetensors): Z-Image-Engineer-V6
V6 GGUF quants: Z-Image-Engineer-V6-GGUF

GGUFs live in their own repo for clean downloads. If you want the newest Z-Image Engineer, go V6.

The Z-Engineer returns — now with a PhD in "not being mid."

This is Z-Engineer V4, the culmination of extensive research into what makes an AI prompt engineer actually good at its job. Built on the Qwen 3 architecture and trained using a novel SMART Training methodology, this 4B parameter model doesn't just describe scenes—it understands the craft of visual storytelling down to the lens flare.

🧠 What is this?

Z-Engineer V4 is a fully fine-tuned (not LoRA, we went all in) version of the text encoder from Tongyi-MAI/Z-Image-Turbo. It's been specifically trained to understand the nuances of AI Image Generation workflows.

It excels at:

Expanding Concepts: Turn "sad robot in rain" into a cinematic fever dream with chromatic aberration, shallow depth of field, and a melancholic color grade that would make Blade Runner jealous.
Technical Precision: It knows the difference between an 85mm portrait lens and a 24mm wide—and will use them appropriately. Lighting? Rembrandt, split, volumetric fog? It's got opinions.
Stylistic Consistency: It writes with a creative voice, not that robotic "hyperrealistic, 8k, trending on artstation" energy.

🔑 Key Use Cases

✨ Prompt Enhancement: A low-VRAM powerhouse for turning your braindead 3AM ideas into detailed visual narratives.
🔌 Z-Image Turbo Encoder: Fully backwards compatible as a drop-in CLIP text encoder for Z-Image Turbo workflows producing varied and unique results from the same seed.
🛡️ Local & Private: Runs entirely on your machine. No API fees, no data logging, no corporate overlords judging your prompts.
⚡ Hybrid Power: Use it to expand a prompt, then use the model itself as the encoder for generation. It's turtles all the way down.

🧬 What's New in V4: SMART Training

This version introduces SMART Training (Smart Mode with Adaptive Regularization Topologer)—a custom training methodology that goes beyond standard cross-entropy optimization.

The secret sauce? Four auxiliary regularizers that operate on hidden states, logits, and weight matrices:

Regularizer	What It Does	Why It Matters
Entropic	Prevents mode collapse, encourages diversity	No more repetitive "cinematic, 8k, masterpiece" loops
Holographic	Enforces depth-wise information compression	Clean feature hierarchy from surface to abstract
Topological	Encourages coherent latent trajectories	Prompts flow logically instead of word salad
Manifold	Stabilizes weight distributions	Rock-solid training dynamics

The result? A model that generalizes better, outputs more varied responses, and doesn't collapse into repetitive patterns even after 55,000 training examples.

📉 Key Improvements Over V2.5

Full Fine-Tune: V2.5 was a merged LoRA. V4 is a full parameter fine-tune—every single weight has been updated.
Bigger Dataset: Trained on 55,000 examples (vs 34,678 for V2.5)—60% more data.
SMART Regularization: Novel training methodology that actively prevents the failure modes that plagued earlier versions.
Longer Training: 7,500+ optimizer steps with extensive validation checkpointing.
Loss Reduction: 55% decrease in validation loss (2.80 → 1.27) compared to baseline.

🔌 ComfyUI Integration (Recommended)

I have a custom node for seamless integration with ComfyUI:

Features: Optimized for local OpenAI API compatible backends (LM Studio, Ollama, etc.)
Get it here: ComfyUI-Z-Engineer

📝 Recommended System Prompt

For best results, use this system prompt:

Interpret the user seed as production intent, then build a definitive 200-250 word single-paragraph image prompt that preserves every explicit constraint while intelligently expanding missing details. First infer the core subject, action, setting, and emotional tone; treat these as non-negotiable anchors. Then enhance with precise visual staging (explicit foreground, midground, background), clear visual hierarchy and eye path, physically plausible lighting (source, direction, softness, color temperature), and optical strategy (if lens/aperture are provided, preserve exactly; if absent, choose fitting lens and aperture and imply their depth-of-field effect). Integrate organic, manufactured, and environmental textures with realistic material behavior, add motion/atmospheric cues only when they support the scene, and apply a coherent color grade consistent with mood and environment. Keep the prose vivid but controlled: no contradictions, no overstuffing, no generic filler. Do not mention camera body brands. Output one polished paragraph only, no bullets, no line breaks, no meta commentary.

💻 Training Facts

I believe in open science. Here's exactly how this was built:

Hardware:

Trained locally on an AMD Strix Halo system (Ryzen AI Max+ 395, 128GB Unified RAM)
AMD Radeon 8060S Graphics (ROCm/HIP)

Dataset:

Size: 55,000 high-quality examples
25,000 Vision-Grounded Samples: Real professional photographs transcribed into the training format using Qwen3-VL-30B-A3B—teaching the model what actually good cinematography looks like
30,000 Synthetic Samples: Generated prompt enhancement pairs for diverse concept coverage
Content: Curated mix teaching the model to extrapolate seed concepts into cinematic prompts grounded in real photographic technique

Training Configuration:

Parameter	Value
Method	Full Fine-Tune (not LoRA)
Base Model	Qwen3-4b-Z-Image-Turbo-AbliteratedV1
Optimizer Steps	7,500+
Batch Size	2 × 8 accumulation = 16 effective
Learning Rate	1e-5 (cosine decay with 5% warmup)
Precision	BFloat16
Sequence Length	640 tokens
Total Training Time	~90 hours

📦 GGUF & Quantization

I provide a full suite of GGUF quantizations for use with llama.cpp, Ollama, and LM Studio:

Quantization	Size	Notes
F16	8.0 GB	Full precision, maximum quality
Q8_0	4.3 GB	Near-lossless, recommended for most users
Q6_K	3.3 GB	Great balance of quality and size
Q5_K_M	2.9 GB	Good quality, smaller footprint
Q5_K_S	2.8 GB	Slightly smaller Q5 variant
Q4_K_M	2.5 GB	Solid 4-bit, good for VRAM-limited setups
Q4_K_S	2.4 GB	Smaller 4-bit variant
Q3_K_L	2.2 GB	Lower quality 3-bit, for the desperate
Q3_K_M	2.1 GB	Medium 3-bit
Q2_K	1.7 GB	Emergency-only tier. But it exists!

🎯 Quick Start

With Ollama:

ollama run BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4

With LM Studio:

Download the GGUF of your choice
Load it in LM Studio
Use the ComfyUI node or chat directly

⚠️ Disclaimer

This model generates text for image prompts. While I have filtered the dataset to the best of my ability, users should exercise their own judgment. I am not responsible for the content you generate.

Also, if you use this to generate prompts for images that get you in trouble, that's a you problem. The model is just vibing.

🙏 Acknowledgements

Qwen Team for the excellent base architecture
Tongyi-MAI for Z-Image-Turbo
The open source AI community for making this kind of work possible
My electricity bill, which now classifies me as a small industrial facility

Built with ❤️ and way too much GPU time by BennyDaBall

Follow me on X @BennyDaBall_OG !

Downloads last month: 7,388

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4

Quantizations

2 models

Spaces using BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4 2

Collection including BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4

Z-Image-Engineer

Collection

Various versions of my Z-Image-Engineer models. • 11 items • Updated Jun 6 • 11