Gemma4 MTPLX Optimized Speed

This is an MTPLX pair bundle for Gemma 4 31B speculative decoding on Apple Silicon.

It is not a single vanilla Transformers model directory. The repository contains two MLX-format artifacts:

  • target/ - Gemma 4 31B IT target, MLX Q4 affine group-size 64
  • assistant/ - official Gemma 4 31B assistant drafter, MLX Q6 affine group-size 64

Use this pair when absolute throughput is the priority.

Source

  • Target source: google/gemma-4-31B-it
  • Target revision: 145dc2508c480a64b47242f160d286cff94a2343
  • Assistant source: google/gemma-4-31B-it-assistant
  • Assistant revision: cffbbd2cea41ea56a0fa5b0487e0d445121fd204

Both artifacts were converted locally to MLX format.

Quantization

Target:

bits: 4
group_size: 64
mode: affine

Assistant:

bits: 6
group_size: 64
mode: affine

MTPLX Usage

After downloading this repository, point MTPLX at the two subdirectories:

mtplx bench gemma-mtp \
  --target-model ./target \
  --assistant-model ./assistant \
  --prompt-suite mtplx/benchmarks/prompts/flappy.jsonl \
  --max-tokens 1000 \
  --draft-block-sizes 6 \
  --allow-unverified-gemma

The Gemma 4 assistant is a separate drafter model. MTPLX uses exact speculative sampling with target verification and residual correction.

Local Benchmark

Prompt: single-file HTML5 Canvas Flappy Bird game, capped at 1000 generated tokens.

Sampler:

temperature: 1.0
top_p: 0.95
top_k: 64
seed: 0

Best observed block size:

block_size: 6
acceptance: 830 / 846 = 98.11%

Observed MTPLX throughput samples:

43.56 tok/s
44.46 tok/s
44.07 tok/s

The bundled benchmark JSON files are in benchmarks/.

Notes

This release is optimized for MTPLX speed experiments. For a higher-precision target, use Youssofal/Gemma4-MTPLX-Optimized-Quality.

Gemma 4 is released by Google under the Gemma 4 license terms linked above.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Youssofal/Gemma4-MTPLX-Optimized-Speed

Finetuned
(131)
this model