gemma-4-31b-it-abliterated-4bit-mlx

A 4-bit MLX quantization of null-space/gemma-4-31b-it-abliterated, tuned for fast on-device inference on Apple Silicon.

  • Base model: null-space/gemma-4-31b-it-abliterated (BF16, ~62 GB)
  • Quantization: 4-bit affine, group size 64
  • Format: MLX safetensors
  • Footprint: ~16 GB on disk, runs comfortably on a 32 GB Mac and flies on 64 GB+
  • Throughput: ~15 tok/s on M4 Max (measured with mlx-lm 0.21+)

Why this exists

The default model in nicedreamzapp/claude-code-local used to point at a mlx-community/... repo that never actually existed. This is the real, working 4-bit quant — drop-in replacement.

Usage

from mlx_lm import load, generate

model, tokenizer = load("divinetribe/gemma-4-31b-it-abliterated-4bit-mlx")
print(generate(model, tokenizer, prompt="Hello", max_tokens=200))

Or via the launcher / setup script in claude-code-local:

MLX_MODEL=divinetribe/gemma-4-31b-it-abliterated-4bit-mlx \
  bash scripts/start-mlx-server.sh

Abliteration

Refusal-direction projection per Arditi et al. (2024). Use responsibly — you are now the moderator.

Downloads last month
1,092
Safetensors
Model size
31B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for divinetribe/gemma-4-31b-it-abliterated-4bit-mlx

Quantized
(3)
this model