Zen5 Max

Top tier of the Zen5 family. The full-Pro base, asymmetrically quantized (routed IQ2_XXS up/gate, Q2_K down; shared experts, attention projections, routing logits and the LM head left at higher precision).

Use when you have 512 GB+ unified memory (Mac Studio M3 Ultra 512 GB) or an 8x H100 / H200 pool and want the deepest reasoning quality in the family. For 128 GB hardware, use zenlm/zen-5-pro-gguf instead.

Part of the canonical Zen5 ladder:

SKU Hardware fit This repo
zen5-flash anything zen-5-flash-gguf
zen5-mini 32 GB zen-5-mini-gguf
zen5 (default) 24 GB+ VRAM zen-5-gguf
zen5-pro 128 GB single-machine zen-5-pro-gguf
zen5-max 512 GB Mac Studio / 8x H100 ← you are here

Files

File pattern Size Quant
main GGUF (*-IQ2XXS-w2Q2K-*-Instruct-imatrix.gguf) 432 GB routed IQ2_XXS + Q2_K, shared Q8_0, attn Q8_0, imatrix-tuned

Run

Hosted via the Hanzo gateway (api.hanzo.ai) as zen5-max.

Local with the zen5-engine:

git clone https://github.com/zenlm/zen5-engine
cd zen5-engine && make                  # macOS Metal
                       # or: make cuda-generic for multi-H100

hf download zenlm/zen-5-max-gguf --local-dir gguf
ln -sf "$(ls gguf/*-Instruct-imatrix.gguf | head -1)" zen5max.gguf
./zen5 -m zen5max.gguf -p "Explain MoE inference."
./zen5-server -m zen5max.gguf --ctx 1000000 --kv-disk-dir /tmp/zen5-kv --kv-disk-space-mb 16384

Acknowledgements

Built on deepseek-ai/DeepSeek-V4-Pro. The asymmetric routed-MoE quantization scheme, GGUF layout, imatrix calibration, and inference engine all come from Salvatore Sanfilippo's antirez/ds4 project. MIT-licensed; both antirez/ds4 and ggml-org/llama.cpp copyrights are preserved in the zen5-engine LICENSE file.

Downloads last month
28
GGUF
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zenlm/zen-5-max-gguf

Quantized
(12)
this model

Collection including zenlm/zen-5-max-gguf