gemma-4-31b-it-abliterated-4bit-mlx

A 4-bit MLX quantization of null-space/gemma-4-31b-it-abliterated, tuned for fast on-device inference on Apple Silicon.

Base model: null-space/gemma-4-31b-it-abliterated (BF16, ~62 GB)
Quantization: 4-bit affine, group size 64
Format: MLX safetensors
Footprint: ~16 GB on disk, runs comfortably on a 32 GB Mac and flies on 64 GB+
Throughput: ~15 tok/s on M4 Max (measured with mlx-lm 0.21+)

Why this exists

The default model in nicedreamzapp/claude-code-local used to point at a mlx-community/... repo that never actually existed. This is the real, working 4-bit quant — drop-in replacement.

Usage

from mlx_lm import load, generate

model, tokenizer = load("divinetribe/gemma-4-31b-it-abliterated-4bit-mlx")
print(generate(model, tokenizer, prompt="Hello", max_tokens=200))

Or via the launcher / setup script in claude-code-local:

MLX_MODEL=divinetribe/gemma-4-31b-it-abliterated-4bit-mlx \
  bash scripts/start-mlx-server.sh

Abliteration

Refusal-direction projection per Arditi et al. (2024). Use responsibly — you are now the moderator.

Downloads last month: 1,092

Safetensors

Model size

31B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for divinetribe/gemma-4-31b-it-abliterated-4bit-mlx

Base model

google/gemma-4-31B-it

Finetuned

null-space/gemma-4-31b-it-abliterated

Quantized

(3)

this model