Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
waltgrace
/
mlx-expert-sniper
like
3
Image-Text-to-Text
MLX
English
apple-silicon
Mixture of Experts
mixture-of-experts
vision-language
gemma
falcon-perception
inference
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
Use this model
main
mlx-expert-sniper
417 kB
Ctrl+K
Ctrl+K
2 contributors
History:
45 commits
waltgrace
initial release: deploy code + split scripts
0e41b61
verified
2 days ago
mac_tensor
initial release: deploy code + split scripts
2 days ago
src
Add Gemma 4-26B-A4B support: 4.15 tok/s on M4 Mac Mini
4 days ago
.gitattributes
Safe
1.52 kB
initial commit
11 days ago
.gitignore
Safe
51 Bytes
v0.1.0: MoE expert sniping for MLX โ run models larger than your RAM
11 days ago
README.md
Safe
7.3 kB
docs: initial README
2 days ago
models_gemma4.py
Safe
22.1 kB
initial release: deploy code + split scripts
2 days ago
pyproject.toml
573 Bytes
initial release: deploy code + split scripts
2 days ago
setup.cfg
Safe
360 Bytes
Update setup.cfg
9 days ago
setup.py
398 Bytes
initial release: deploy code + split scripts
2 days ago
split_gemma4.py
7.48 kB
initial release: deploy code + split scripts
2 days ago
split_qwen.py
7.74 kB
initial release: deploy code + split scripts
2 days ago
stream_preprocess.py
Safe
8.17 kB
v0.1.0: MoE expert sniping for MLX โ run models larger than your RAM
11 days ago
stream_preprocess_35b.py
Safe
8.65 kB
v0.2.0: Add Qwen3.5-35B-A3B support (5.78 tok/s, 19.5 GB on 16 GB RAM)
10 days ago
stream_preprocess_coder.py
Safe
6.87 kB
v0.3.0: Add Qwen3-Coder-30B + qwen3_5_moe support, thinking filter, 3 models verified
10 days ago