Qwen3-Coder-Next 80B — APEX I-Quality GGUF

First APEX I-Quality quantization of Qwen3-Coder-Next 80B, calibrated on a code corpus.

This is an APEX I-Quality quantization of Qwen/Qwen3-Coder-Next — an 80B parameter Mixture-of-Experts model with only 3B active parameters per token, designed specifically for coding agents and local development.

What Makes This Different

APEX I-Quality profile — the highest quality tier in the APEX quantization framework, using per-tensor type optimization for MoE architectures
Code-calibrated imatrix — importance matrix generated from 50,575 code samples (not Wikipedia). The imatrix tells the quantizer which weights matter most for code generation, syntax, tool calling, and agent workloads
Production tested — this exact model runs in production powering PicoClaw coding agents on AMD Ryzen AI Max+ 395 hardware

Files

File	Size	Description
`Qwen3-Coder-Next-APEX-I-Quality.gguf`	54.1 GB	APEX I-Quality quantized model (5.43 BPW)
`imatrix-coder-next.dat`	457 MB	Code-calibrated importance matrix — use this for your own quantizations

Model Details


Architecture	qwen3next (hybrid attention + SSM with MoE)
Total Parameters	79.67B
Active Parameters	~3B per token (10 of 512 experts)
Expert Count	512 experts, 10 active per token
Context Length	262,144 tokens (native)
Original Type	BF16 (148.5 GB)
Quantized Size	54.1 GB (5.43 BPW)
Quantization	APEX I-Quality (Q6_K/Q5_K/IQ4_XS experts, Q8_0 shared, Q6_K attention)

Performance

Tested on AMD Ryzen AI Max+ 395 (128GB unified memory, ROCm/Vulkan):

Metric	Value
Output Speed	~50-60 t/s
Prompt Processing	Fast (MoE architecture)
Memory Usage	~54 GB model + KV cache
Parallel Sessions	4 (with --parallel 4)

The 3B active parameter design means this 80B model runs at speeds comparable to — or faster than — much smaller dense models. On our hardware, it outperforms the 30B variant in both speed and quality.

How to Run

llama.cpp (recommended)

# Clone or download the model
huggingface-cli download stacksnathan/Qwen3-Coder-Next-80B-APEX-I-Quality-GGUF \
  Qwen3-Coder-Next-APEX-I-Quality.gguf \
  --local-dir ./models/

# Run with llama-server
./llama-server \
  -m ./models/Qwen3-Coder-Next-APEX-I-Quality.gguf \
  --host 0.0.0.0 --port 8080 \
  --ctx-size 32768 --parallel 4 \
  -ngl 99 --no-mmap

Ollama

Create a Modelfile:

FROM ./Qwen3-Coder-Next-APEX-I-Quality.gguf
PARAMETER num_ctx 32768

Then:

ollama create coder-next -f Modelfile
ollama run coder-next

Hardware Requirements

Setup	RAM/VRAM	Notes
AMD Ryzen AI Max+ 395	128 GB unified	Recommended. Full GPU offload, fast inference
Apple M4 Max/Ultra	128 GB+ unified	Should work well with Metal
Dual GPU (48GB each)	96 GB+ VRAM	Split across GPUs
CPU + RAM	64 GB+ RAM	Slower, but works with mmap

Minimum ~58 GB free memory for model + KV cache at 32K context.

Using the Imatrix

The included imatrix-coder-next.dat was generated from 50K+ code samples using llama-imatrix. You can use it for your own quantizations of Qwen3-Coder-Next:

# Download just the imatrix
huggingface-cli download stacksnathan/Qwen3-Coder-Next-80B-APEX-I-Quality-GGUF \
  imatrix-coder-next.dat \
  --local-dir ./

# Use it with llama-quantize for custom quants
./llama-quantize \
  --imatrix ./imatrix-coder-next.dat \
  Qwen3-Coder-Next-BF16.gguf \
  output.gguf Q4_K_M

About

Quantized by STACKS! Container Hosting — a cloud platform built on owned hardware. This model powers our PicoClaw AI coding agents, offering unlimited inference at flat-rate pricing.

We believe in giving back to the open source community. This quantization and the code-calibrated imatrix are provided freely under the same Apache 2.0 license as the original model.

Acknowledgments

Qwen Team for the incredible Coder-Next model
Mudler for the APEX quantization framework
eaddario for the code calibration dataset
The llama.cpp community for making local inference possible

Downloads last month: 1,312

GGUF

Model size

80B params

Architecture

qwen3next

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for stacksnathan/Qwen3-Coder-Next-80B-APEX-I-Quality-GGUF

Base model

Qwen/Qwen3-Coder-Next

Quantized

(102)

this model