Qwen3-Coder-Next 80B β€” APEX I-Quality GGUF

First APEX I-Quality quantization of Qwen3-Coder-Next 80B, calibrated on a code corpus.

This is an APEX I-Quality quantization of Qwen/Qwen3-Coder-Next β€” an 80B parameter Mixture-of-Experts model with only 3B active parameters per token, designed specifically for coding agents and local development.

What Makes This Different

  • APEX I-Quality profile β€” the highest quality tier in the APEX quantization framework, using per-tensor type optimization for MoE architectures
  • Code-calibrated imatrix β€” importance matrix generated from 50,575 code samples (not Wikipedia). The imatrix tells the quantizer which weights matter most for code generation, syntax, tool calling, and agent workloads
  • Production tested β€” this exact model runs in production powering PicoClaw coding agents on AMD Ryzen AI Max+ 395 hardware

Files

File Size Description
Qwen3-Coder-Next-APEX-I-Quality.gguf 54.1 GB APEX I-Quality quantized model (5.43 BPW)
imatrix-coder-next.dat 457 MB Code-calibrated importance matrix β€” use this for your own quantizations

Model Details

Architecture qwen3next (hybrid attention + SSM with MoE)
Total Parameters 79.67B
Active Parameters ~3B per token (10 of 512 experts)
Expert Count 512 experts, 10 active per token
Context Length 262,144 tokens (native)
Original Type BF16 (148.5 GB)
Quantized Size 54.1 GB (5.43 BPW)
Quantization APEX I-Quality (Q6_K/Q5_K/IQ4_XS experts, Q8_0 shared, Q6_K attention)

Performance

Tested on AMD Ryzen AI Max+ 395 (128GB unified memory, ROCm/Vulkan):

Metric Value
Output Speed ~50-60 t/s
Prompt Processing Fast (MoE architecture)
Memory Usage ~54 GB model + KV cache
Parallel Sessions 4 (with --parallel 4)

The 3B active parameter design means this 80B model runs at speeds comparable to β€” or faster than β€” much smaller dense models. On our hardware, it outperforms the 30B variant in both speed and quality.

How to Run

llama.cpp (recommended)

# Clone or download the model
huggingface-cli download stacksnathan/Qwen3-Coder-Next-80B-APEX-I-Quality-GGUF \
  Qwen3-Coder-Next-APEX-I-Quality.gguf \
  --local-dir ./models/

# Run with llama-server
./llama-server \
  -m ./models/Qwen3-Coder-Next-APEX-I-Quality.gguf \
  --host 0.0.0.0 --port 8080 \
  --ctx-size 32768 --parallel 4 \
  -ngl 99 --no-mmap

Ollama

Create a Modelfile:

FROM ./Qwen3-Coder-Next-APEX-I-Quality.gguf
PARAMETER num_ctx 32768

Then:

ollama create coder-next -f Modelfile
ollama run coder-next

Hardware Requirements

Setup RAM/VRAM Notes
AMD Ryzen AI Max+ 395 128 GB unified Recommended. Full GPU offload, fast inference
Apple M4 Max/Ultra 128 GB+ unified Should work well with Metal
Dual GPU (48GB each) 96 GB+ VRAM Split across GPUs
CPU + RAM 64 GB+ RAM Slower, but works with mmap

Minimum ~58 GB free memory for model + KV cache at 32K context.

Using the Imatrix

The included imatrix-coder-next.dat was generated from 50K+ code samples using llama-imatrix. You can use it for your own quantizations of Qwen3-Coder-Next:

# Download just the imatrix
huggingface-cli download stacksnathan/Qwen3-Coder-Next-80B-APEX-I-Quality-GGUF \
  imatrix-coder-next.dat \
  --local-dir ./

# Use it with llama-quantize for custom quants
./llama-quantize \
  --imatrix ./imatrix-coder-next.dat \
  Qwen3-Coder-Next-BF16.gguf \
  output.gguf Q4_K_M

About

Quantized by STACKS! Container Hosting β€” a cloud platform built on owned hardware. This model powers our PicoClaw AI coding agents, offering unlimited inference at flat-rate pricing.

We believe in giving back to the open source community. This quantization and the code-calibrated imatrix are provided freely under the same Apache 2.0 license as the original model.

Acknowledgments

  • Qwen Team for the incredible Coder-Next model
  • Mudler for the APEX quantization framework
  • eaddario for the code calibration dataset
  • The llama.cpp community for making local inference possible
Downloads last month
1,312
GGUF
Model size
80B params
Architecture
qwen3next
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for stacksnathan/Qwen3-Coder-Next-80B-APEX-I-Quality-GGUF

Quantized
(102)
this model