Instructions to use mach-kernel/gemma-3-12b-it-antislop-4b-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mach-kernel/gemma-3-12b-it-antislop-4b-mlx with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("mach-kernel/gemma-3-12b-it-antislop-4b-mlx") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- MLX LM
How to use mach-kernel/gemma-3-12b-it-antislop-4b-mlx with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "mach-kernel/gemma-3-12b-it-antislop-4b-mlx" --prompt "Once upon a time"
mach-kernel/gemma-3-12b-it-antislop-4b-mlx
MLX-optimized quantized version of sam-paech/gemma-3-12b-it-antislop for Apple Silicon.
Quantization Details
| Setting | Value |
|---|---|
| Method | Mixed-precision (4-bit + 6-bit) |
| Predicate | mixed_4_6 |
| Group Size | 32 |
| Avg Bits/Weight | ~4.9 |
| mlx-lm Version | 0.24.1 |
Why mixed_4_6? This quantization strategy keeps sensitive layers at 6-bit precision while using 4-bit for less critical layers, providing better accuracy than uniform 4-bit quantization with minimal size increase.
Why group-size 32? Smaller group sizes (32 vs default 128) provide finer granularity during quantization, reducing quality loss at a slight memory overhead.
Conversion Command
mlx_lm.convert \
--hf-path sam-paech/gemma-3-12b-it-antislop \
--mlx-path ./output \
-q \
--quant-predicate mixed_4_6 \
--q-group-size 32
Usage
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("mach-kernel/gemma-3-12b-it-antislop-4b-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
About the Base Model
This is a quantized version of gemma-3-12b-it-antislop, a Gemma 3 12B instruct model fine-tuned to reduce repetitive/cliche language patterns ("slop").
- Downloads last month
- 67
4-bit
Model tree for mach-kernel/gemma-3-12b-it-antislop-4b-mlx
Base model
sam-paech/gemma-3-12b-it-antislop