mach-kernel/gemma-3-12b-it-antislop-4b-mlx

MLX-optimized quantized version of sam-paech/gemma-3-12b-it-antislop for Apple Silicon.

Quantization Details

Setting	Value
Method	Mixed-precision (4-bit + 6-bit)
Predicate	`mixed_4_6`
Group Size	32
Avg Bits/Weight	~4.9
mlx-lm Version	0.24.1

Why mixed_4_6? This quantization strategy keeps sensitive layers at 6-bit precision while using 4-bit for less critical layers, providing better accuracy than uniform 4-bit quantization with minimal size increase.

Why group-size 32? Smaller group sizes (32 vs default 128) provide finer granularity during quantization, reducing quality loss at a slight memory overhead.

Conversion Command

mlx_lm.convert \
    --hf-path sam-paech/gemma-3-12b-it-antislop \
    --mlx-path ./output \
    -q \
    --quant-predicate mixed_4_6 \
    --q-group-size 32

Usage

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("mach-kernel/gemma-3-12b-it-antislop-4b-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

About the Base Model

This is a quantized version of gemma-3-12b-it-antislop, a Gemma 3 12B instruct model fine-tuned to reduce repetitive/cliche language patterns ("slop").

Downloads last month: 67

Safetensors

Model size

2B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for mach-kernel/gemma-3-12b-it-antislop-4b-mlx

Base model

sam-paech/gemma-3-12b-it-antislop

Quantized

(6)

this model