Commit History

initial release: deploy code + split scripts
0e41b61
verified

waltgrace commited on

docs: initial README
7400275
verified

waltgrace commited on

Add Gemma 4-26B-A4B support: 4.15 tok/s on M4 Mac Mini
3f56a7b

Nico Claude Opus 4.6 (1M context) commited on

Add Gemma 4 MLX model class + preprocess
cc1d5e2
verified

waltgrace commited on

Add Gemma 4 MLX model class + preprocess
4a30158
verified

waltgrace commited on

Add Gemma 4 MLX model class + preprocess
337dcd8
verified

waltgrace commited on

Add Gemma 4 MLX model class + preprocess
e2bb666
verified

waltgrace commited on

Add Gemma 4 MLX model class + preprocess
7409761
verified

waltgrace commited on

Add Coder model + multi-model fixes
46f93d5
verified

waltgrace commited on

Add Coder model + multi-model fixes
3a81687
verified

waltgrace commited on

Add Coder model + multi-model fixes
62e0ebd
verified

waltgrace commited on

Add Coder model + multi-model fixes
7878baa
verified

waltgrace commited on

Add Coder model + multi-model fixes
bded42e
verified

waltgrace commited on

Add chat command + shared generate module
3e350f6
verified

waltgrace commited on

Add chat command + shared generate module
8f6ad09
verified

waltgrace commited on

Add routing bias comparison table: bias=1.0 universal sweet spot
33d06cc
verified

waltgrace commited on

Update 30B: 4.29 tok/s with bias=1.0
b6eb862
verified

waltgrace commited on

Add 30B support + bias sweep results
517be8a
verified

waltgrace commited on

Add 30B support + bias sweep results
8829376
verified

waltgrace commited on

Add 30B support + bias sweep results
22d3dc7
verified

waltgrace commited on

Add Ollama-compatible serve command
f7efda4
verified

waltgrace commited on

Add Ollama-compatible serve command
55fe21b
verified

waltgrace commited on

Add download command: mlx-sniper download qwen3.5-35b
e1498f9
verified

waltgrace commited on

Add download command: mlx-sniper download qwen3.5-35b
d694ca1
verified

waltgrace commited on

Add supported models table and hardware requirements
6d64d7d
verified

waltgrace commited on

Update src/mlx_expert_sniper/preprocess.py
05209e2
verified

waltgrace commited on

Update src/mlx_expert_sniper/expert_io.py
bc2c2b1
verified

waltgrace commited on

Update src/mlx_expert_sniper/engine.py
7e0ed9b
verified

waltgrace commited on

Update src/mlx_expert_sniper/coactivation.py
482f969
verified

waltgrace commited on

Update src/mlx_expert_sniper/calibrate.py
9110344
verified

waltgrace commited on

Update src/mlx_expert_sniper/cli.py
1a7f86e
verified

waltgrace commited on

Update src/mlx_expert_sniper/__init__.py
0535634
verified

waltgrace commited on

Update setup.py
272d3b8
verified

waltgrace commited on

Update setup.cfg
4986dbb
verified

waltgrace commited on

Update pyproject.toml
1e8d685
verified

waltgrace commited on

Correct to 5.37 tok/s, remove REAP from claims
a3be831
verified

waltgrace commited on

35B: 5.18 tok/s with REAP masking + routing bias=1.0
d0d885e
verified

waltgrace commited on

Update verified results: 35B at 2.42 tok/s, 30B at 3.34 tok/s
a22ed79
verified

waltgrace commited on

Update benchmark numbers: verified 3.3 tok/s across varied prompts
2968a91
verified

waltgrace commited on

Update README.md
214fea8
verified

waltgrace commited on

Add YAML metadata to fix repo card warning
7d037a5
verified

waltgrace commited on

v0.3.0: Add Qwen3-Coder-30B + qwen3_5_moe support, thinking filter, 3 models verified
21d029f
verified

waltgrace commited on

v0.2.0: Add Qwen3.5-35B-A3B support (5.78 tok/s, 19.5 GB on 16 GB RAM)
d14a3c2
verified

waltgrace commited on

v0.1.0: MoE expert sniping for MLX — run models larger than your RAM
2d8a9bb
verified

waltgrace commited on

initial commit
d4ead01
verified

waltgrace commited on