waltgrace
/

mlx-expert-sniper

Image-Text-to-Text

Mixture of Experts

mixture-of-experts

vision-language

falcon-perception

Model card Files Files and versions

mlx-expert-sniper

417 kB

Ctrl+K

Ctrl+K

2 contributors

History: 45 commits

waltgrace's picture

initial release: deploy code + split scripts

0e41b61 verified about 2 months ago

mac_tensor
initial release: deploy code + split scripts about 2 months ago
src
Add Gemma 4-26B-A4B support: 4.15 tok/s on M4 Mac Mini about 2 months ago
.gitattributes

1.52 kB
initial commit about 2 months ago
.gitignore

51 Bytes
v0.1.0: MoE expert sniping for MLX — run models larger than your RAM about 2 months ago
README.md

7.3 kB
docs: initial README about 2 months ago
models_gemma4.py

22.1 kB
initial release: deploy code + split scripts about 2 months ago
pyproject.toml

573 Bytes
initial release: deploy code + split scripts about 2 months ago
setup.cfg

360 Bytes
Update setup.cfg about 2 months ago
setup.py

398 Bytes
initial release: deploy code + split scripts about 2 months ago
split_gemma4.py

7.48 kB
initial release: deploy code + split scripts about 2 months ago
split_qwen.py

7.74 kB
initial release: deploy code + split scripts about 2 months ago
stream_preprocess.py

8.17 kB
v0.1.0: MoE expert sniping for MLX — run models larger than your RAM about 2 months ago
stream_preprocess_35b.py

8.65 kB
v0.2.0: Add Qwen3.5-35B-A3B support (5.78 tok/s, 19.5 GB on 16 GB RAM) about 2 months ago
stream_preprocess_coder.py

6.87 kB
v0.3.0: Add Qwen3-Coder-30B + qwen3_5_moe support, thinking filter, 3 models verified about 2 months ago