waltgrace
/

mlx-expert-sniper

Image-Text-to-Text

Mixture of Experts

mixture-of-experts

vision-language

falcon-perception

Model card Files Files and versions

mlx-expert-sniper

Commit History

initial release: deploy code + split scripts

0e41b61
verified

waltgrace commited on Apr 8

docs: initial README

7400275
verified

waltgrace commited on Apr 8

Add Gemma 4-26B-A4B support: 4.15 tok/s on M4 Mac Mini

3f56a7b

Nico Claude Opus 4.6 (1M context) commited on Apr 6

Add Gemma 4 MLX model class + preprocess

cc1d5e2
verified

waltgrace commited on Apr 2

Add Gemma 4 MLX model class + preprocess

4a30158
verified

waltgrace commited on Apr 2

Add Gemma 4 MLX model class + preprocess

337dcd8
verified

waltgrace commited on Apr 2

Add Gemma 4 MLX model class + preprocess

e2bb666
verified

waltgrace commited on Apr 2

Add Gemma 4 MLX model class + preprocess

7409761
verified

waltgrace commited on Apr 2

Add Coder model + multi-model fixes

46f93d5
verified

waltgrace commited on Apr 2

Add Coder model + multi-model fixes

3a81687
verified

waltgrace commited on Apr 2

Add Coder model + multi-model fixes

62e0ebd
verified

waltgrace commited on Apr 2

Add Coder model + multi-model fixes

7878baa
verified

waltgrace commited on Apr 2

Add Coder model + multi-model fixes

bded42e
verified

waltgrace commited on Apr 2

Add chat command + shared generate module

3e350f6
verified

waltgrace commited on Apr 2

Add chat command + shared generate module

8f6ad09
verified

waltgrace commited on Apr 2

Add routing bias comparison table: bias=1.0 universal sweet spot

33d06cc
verified

waltgrace commited on Apr 2

Update 30B: 4.29 tok/s with bias=1.0

b6eb862
verified

waltgrace commited on Apr 2

Add 30B support + bias sweep results

517be8a
verified

waltgrace commited on Apr 2

Add 30B support + bias sweep results

8829376
verified

waltgrace commited on Apr 2

Add 30B support + bias sweep results

22d3dc7
verified

waltgrace commited on Apr 2

Add Ollama-compatible serve command

f7efda4
verified

waltgrace commited on Apr 2

Add Ollama-compatible serve command

55fe21b
verified

waltgrace commited on Apr 2

Add download command: mlx-sniper download qwen3.5-35b

e1498f9
verified

waltgrace commited on Apr 2

Add download command: mlx-sniper download qwen3.5-35b

d694ca1
verified

waltgrace commited on Apr 2

Add supported models table and hardware requirements

6d64d7d
verified

waltgrace commited on Apr 1

Update src/mlx_expert_sniper/preprocess.py

05209e2
verified

waltgrace commited on Apr 1

Update src/mlx_expert_sniper/expert_io.py

bc2c2b1
verified

waltgrace commited on Apr 1

Update src/mlx_expert_sniper/engine.py

7e0ed9b
verified

waltgrace commited on Apr 1

Update src/mlx_expert_sniper/coactivation.py

482f969
verified

waltgrace commited on Apr 1

Update src/mlx_expert_sniper/calibrate.py

9110344
verified

waltgrace commited on Apr 1

Update src/mlx_expert_sniper/cli.py

1a7f86e
verified

waltgrace commited on Apr 1

Update src/mlx_expert_sniper/init.py

0535634
verified

waltgrace commited on Apr 1

Update setup.py

272d3b8
verified

waltgrace commited on Apr 1

Update setup.cfg

4986dbb
verified

waltgrace commited on Apr 1

Update pyproject.toml

1e8d685
verified

waltgrace commited on Apr 1

Correct to 5.37 tok/s, remove REAP from claims

a3be831
verified

waltgrace commited on Apr 1

35B: 5.18 tok/s with REAP masking + routing bias=1.0

d0d885e
verified

waltgrace commited on Apr 1

Update verified results: 35B at 2.42 tok/s, 30B at 3.34 tok/s

a22ed79
verified

waltgrace commited on Apr 1

Update benchmark numbers: verified 3.3 tok/s across varied prompts

2968a91
verified

waltgrace commited on Apr 1

Update README.md

214fea8
verified

waltgrace commited on Apr 1

Add YAML metadata to fix repo card warning

7d037a5
verified

waltgrace commited on Mar 31

v0.3.0: Add Qwen3-Coder-30B + qwen3_5_moe support, thinking filter, 3 models verified

21d029f
verified

waltgrace commited on Mar 31

v0.2.0: Add Qwen3.5-35B-A3B support (5.78 tok/s, 19.5 GB on 16 GB RAM)

d14a3c2
verified

waltgrace commited on Mar 31

v0.1.0: MoE expert sniping for MLX — run models larger than your RAM

2d8a9bb
verified

waltgrace commited on Mar 30

initial commit

d4ead01
verified

waltgrace commited on Mar 30