Gemma3-Callous-Calla-4B — MLX builds (Apple Silicon)

This repo hosts MLX-converted variants of Daizee/Gemma3-Callous-Calla-4B for fast, local inference on Apple Silicon (M-series).
Tokenizer/config are included at the repo root. MLX weight folders live under mlx/.

Note on vocab padding: For MLX compatibility, the tokenizer/embeddings were padded to the next multiple of 64 tokens.
In this build: 262,208 tokens (added 64 placeholder tokens named <pad_ex_*>).

Variants

Path	Bits	Group Size	Notes
`mlx/g128/`	int4	128	Smallest & fastest
`mlx/g64/`	int4	64	Slightly larger, often steadier
`mlx/int8/`	8	—	Closest to fp16 quality (slower)

Quickstart (MLX-LM)

Run from Hugging Face (no cloning needed)

python -m mlx_lm.generate \
  --model hf://Daizee/Gemma3-Callous-Calla-4B-mlx/mlx/g64 \
  --prompt "Summarize the Bill of Rights for 7th graders in 4 bullet points." \
  --max-tokens 180 --temp 0.3 --top-p 0.92

Downloads last month: 11

MLX

Hardware compatibility

Quantized

Model tree for Daizee/Gemma3-Callous-Calla-4B-mlx

Base model

Daizee/Gemma3-Callous-Calla-4B

Finetuned

(1)

this model