ToPo-ToPo/gemma-4-31b-it-mlx-8bit

MLX 8bit conversion of google/gemma-4-31b-it for Apple Silicon (mlx-vlm).

Provenance (self-converted from official weights)

Source: google/gemma-4-31b-it (license: gemma)
Tool: mlx-vlm 0.6.3 — mlx_vlm.convert --hf-path google/gemma-4-31b-it --mlx-path . -q --q-bits 8 --q-group-size 64
Effective: 8.643 bits/weight
Validation: reproduced geometrically exact CAD output in an agentic CAD+FEM pipeline (volumes match the reference mlx-community conversion).

Usage

from mlx_vlm import load, generate
model, processor = load("ToPo-ToPo/gemma-4-31b-it-mlx-8bit")

License

This is a derivative of Google Gemma. Use is governed by the Gemma Terms of Use and the Gemma Prohibited Use Policy. Weights were converted/quantized to MLX format (modification notice per the Gemma Terms).

⚡ Faster generation with MTP (speculative decoding, lossless)

Recommended drafter: google/gemma-4-31b-it-assistant — Google's official MTP drafter for this model. It loads directly in mlx-vlm (no conversion needed) and gives up to ~3x faster generation (≈1.4–1.5x measured on short prompts); output is identical to non-MTP decoding.

# requires:  pip install "mlx-vlm>=0.6.3"
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

model, processor = load("ToPo-ToPo/gemma-4-31b-it-mlx-8bit")
draft_model, _   = load("google/gemma-4-31b-it-assistant")
config = load_config("ToPo-ToPo/gemma-4-31b-it-mlx-8bit")

prompt = apply_chat_template(processor, config, "Hello!", num_images=0)
out = generate(model, processor, prompt,
               draft_model=draft_model, draft_kind="mtp", max_tokens=256)

CLI (draft_kind auto-detected): mlx_vlm.generate --model ToPo-ToPo/gemma-4-31b-it-mlx-8bit --draft-model google/gemma-4-31b-it-assistant

Notes

draft_kind="mtp" is required in the Python API (the CLI auto-detects it).
Use this model's own drafter above — drafters are size-specific and not interchangeable across Gemma 4 variants.
Needs mlx-vlm >= 0.6.3. MTP is lossless — if output differs from non-MTP, your versions are mismatched.

Downloads last month: 25

Safetensors

Model size

9B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Collection including ToPo-ToPo/gemma-4-31b-it-mlx-8bit

Gemma4

Collection

自分でmlxに変換したgemma4シリーズ • 10 items • Updated 1 day ago