Gemma4 MTPLX Optimized Quality

This is an MTPLX pair bundle for Gemma 4 31B speculative decoding on Apple Silicon.

It is not a single vanilla Transformers model directory. The repository contains two MLX-format artifacts:

  • target/ - Gemma 4 31B IT target, MLX Q8 affine group-size 64
  • assistant/ - official Gemma 4 31B assistant drafter, MLX Q8 affine group-size 64

Use this pair when target precision and high acceptance are the priority.

Source

  • Target source: google/gemma-4-31B-it
  • Target revision: 145dc2508c480a64b47242f160d286cff94a2343
  • Assistant source: google/gemma-4-31B-it-assistant
  • Assistant revision: cffbbd2cea41ea56a0fa5b0487e0d445121fd204

Both artifacts were converted locally to MLX format.

Quantization

Target:

bits: 8
group_size: 64
mode: affine

Assistant:

bits: 8
group_size: 64
mode: affine

MTPLX Usage

After downloading this repository, point MTPLX at the two subdirectories:

mtplx bench gemma-mtp \
  --target-model ./target \
  --assistant-model ./assistant \
  --prompt-suite mtplx/benchmarks/prompts/flappy.jsonl \
  --max-tokens 1000 \
  --draft-block-sizes 6 \
  --allow-unverified-gemma

The Gemma 4 assistant is a separate drafter model. MTPLX uses exact speculative sampling with target verification and residual correction.

Local Benchmark

Prompt: single-file HTML5 Canvas Flappy Bird game, capped at 1000 generated tokens.

Sampler:

temperature: 1.0
top_p: 0.95
top_k: 64
seed: 0

Best observed block size:

block_size: 6
acceptance: 833 / 835 = 99.76%
speedup_vs_ar: 2.49x

Observed MTPLX throughput samples:

34.22 tok/s
32.88 tok/s
33.12 tok/s

The bundled benchmark JSON file is in benchmarks/.

Notes

This release is optimized for target precision and high acceptance. It is not the fastest absolute-TPS pair; for speed, use Youssofal/Gemma4-MTPLX-Optimized-Speed.

Gemma 4 is released by Google under the Gemma 4 license terms linked above.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Youssofal/Gemma4-MTPLX-Optimized-Quality

Finetuned
(135)
this model