Instructions to use BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4", dtype="auto")

llama-cpp-python

How to use BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4",
	filename="LFM2.5-1.2B-Z-Image-Engineer-V4-F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4:Q4_K_M

Use Docker

docker model run hf.co/BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4:Q4_K_M

LM Studio
Jan

vLLM

How to use BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4:Q4_K_M

SGLang

How to use BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4 with Ollama:
```
ollama run hf.co/BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4:Q4_K_M
```

Unsloth Studio

How to use BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4 to start chatting

How to use BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4 with Docker Model Runner:
```
docker model run hf.co/BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4:Q4_K_M
```

Lemonade

How to use BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4:Q4_K_M

Run and chat with the model

lemonade run user.LFM2.5-1.2B-Z-Image-Engineer-V4-Q4_K_M

List all available models

lemonade list

🚀 LFM2.5-1.2B-Z-Image-Engineer-V4

The Z-Engineer goes liquid—smaller, faster, and ready to drink.

This is Z-Engineer V4 built on **Liquid Foundation Model 2.5 (LFM2.5)**—a 1.2B parameter model that punches way above its weight class. Perfect for batch workflows where you need prompt engineering at warp speed.

🧠 What is this?

LFM2.5-1.2B-Z-Image-Engineer-V4 is a fully fine-tuned version of LiquidAI/LFM2.5-1.2B-Base. It's been specifically trained to understand the nuances of AI Image Generation workflows.

It excels at:

Expanding Concepts: Turn "neon samurai" into a full cinematic sequence with lighting, lens choices, and atmosphere.
Technical Precision: Understands camera terminology, lighting setups, and film aesthetics.
Blazing Speed: At 1.2B parameters, it's ~3x faster than the Qwen3-4B version while maintaining quality.

🔑 Key Use Cases

⚡ High-Throughput Workflows: When you need to expand hundreds or thousands of prompts, LFM2.5's speed shines.
💾 Low VRAM Deployments: Runs comfortably on minimal hardware—perfect for embedded or edge use cases.
🛡️ Local & Private: Runs entirely on your machine. No API fees, no data logging.
🔌 ComfyUI Ready: Works with the same ComfyUI-Z-Engineer node as the Qwen3 version.

🧬 SMART Training: Adapted for LFM2.5's Hybrid Architecture

This version uses SMART Training (Smart Mode with Adaptive Regularization Topologer)—the same methodology used for Qwen3-4B-Z-Engineer-V4, but adapted for LFM2.5's unique hybrid architecture.

LFM2.5's Challenge: Unlike traditional transformers, LFM2.5 uses a hybrid architecture mixing attention layers with recurrent (liquid) layers. The standard SMART regularizers needed significant adaptation:

Adaptation	What Changed	Why
Attention-Only Filtering	Regularizers only process attention layer outputs, skipping recurrent layers	Recurrent layer hidden states have different statistical properties
Layer Pooling	Last 4 attention layers are mean-pooled for topology regularization	Provides stable representation despite sparser attention placement
Reduced Regularizer Weights	Entropic: 0.003, Holographic: 0.01, Topology: 0.02/0.02	LFM2.5's smaller capacity needs gentler regularization
Superfluid-Inspired Damping	"SmartGate" auto-reduces aux loss contribution on gradient instability	Prevents training collapse when hybrid layers produce non-finite gradients

The result? Stable training on a fundamentally different architecture while still benefiting from diversity, coherence, and depth regularization.

📉 Why Choose LFM2.5 Over Qwen3-4B?

Aspect	LFM2.5-1.2B	Qwen3-4B
Parameters	1.2B	4B
Speed	~3x faster	Baseline
VRAM	~1-2 GB (Q4)	~2.5 GB (Q4)
Quality	Good for most use cases	Highest quality
Best For	Batch processing, edge deployment, speed-critical workflows	Maximum quality, complex scenes

Choose LFM2.5 when: You're processing large batches, running on limited hardware, or speed matters more than marginal quality gains.

Choose Qwen3-4B when: You want the absolute best quality and can afford the extra compute.

🔌 ComfyUI Integration

Works with the same custom node as the Qwen3 version:

Get it here: ComfyUI-Z-Engineer

📝 Recommended System Prompt

For best results, use this system prompt:

Interpret the user seed as production intent, then build a definitive 200-250 word single-paragraph image prompt that preserves every explicit constraint while intelligently expanding missing details. First infer the core subject, action, setting, and emotional tone; treat these as non-negotiable anchors. Then enhance with precise visual staging (explicit foreground, midground, background), clear visual hierarchy and eye path, physically plausible lighting (source, direction, softness, color temperature), and optical strategy (if lens/aperture are provided, preserve exactly; if absent, choose fitting lens and aperture and imply their depth-of-field effect). Integrate organic, manufactured, and environmental textures with realistic material behavior, add motion/atmospheric cues only when they support the scene, and apply a coherent color grade consistent with mood and environment. Keep the prose vivid but controlled: no contradictions, no overstuffing, no generic filler. Do not mention camera body brands. Output one polished paragraph only, no bullets, no line breaks, no meta commentary.

💻 Training Facts

I believe in open science. Here's exactly how this was built:

Hardware:

Trained locally on an AMD Strix Halo system (Ryzen AI Max+ 395, 128GB Unified RAM)
AMD Radeon 8060S Graphics (ROCm/HIP)

Dataset:

Size: 55,000 high-quality examples (same dataset as Qwen3-4B version)
25,000 Vision-Grounded Samples: Real professional photographs transcribed using Qwen3-VL-30B-A3B
30,000 Synthetic Samples: Generated prompt enhancement pairs

Training Configuration:

Parameter	Value
Method	Full Fine-Tune (not LoRA)
Base Model	LiquidAI/LFM2.5-1.2B-Base
Optimizer Steps	3,500
Batch Size	8 × 3 accumulation = 24 effective
Learning Rate	5e-6 (cosine decay with 5% warmup)
Precision	BFloat16
Sequence Length	640 tokens

📦 GGUF & Quantization

I provide a full suite of GGUF quantizations for use with llama.cpp, Ollama, and LM Studio:

Quantization	Size	Notes
F16	2.2 GB	Full precision, maximum quality
Q8_0	1.2 GB	Near-lossless, recommended
Q6_K	918 MB	Great balance
Q5_K_M	804 MB	Good quality
Q5_K_S	787 MB	Slightly smaller
Q4_K_M	697 MB	Solid 4-bit
Q4_K_S	668 MB	Smaller 4-bit
Q3_K_L	606 MB	Lower quality
Q3_K_M	573 MB	Medium 3-bit

🎯 Quick Start

With LM Studio:

Download the GGUF of your choice
Load it in LM Studio
Use the ComfyUI node or chat directly

⚠️ Disclaimer

This model generates text for image prompts. While I have filtered the dataset to the best of my ability, users should exercise their own judgment. I am not responsible for the content you generate.

🙏 Acknowledgements

LiquidAI for the excellent LFM2.5 architecture
Qwen Team for the VL model used in dataset creation
The open source AI community for making this kind of work possible

Built with ❤️ and liquid courage by BennyDaBall

Downloads last month: 502

GGUF

Model size

1B params

Architecture

lfm2

Hardware compatibility

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Space using BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4 1

Collection including BennyDaBall/LFM2.5-1.2B-Z-Image-Engineer-V4

Z-Image-Engineer

Collection

Various versions of my Z-Image-Engineer models. • 9 items • Updated Feb 4 • 9