Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

waltgrace
/
mlx-expert-sniper

Image-Text-to-Text
MLX
English
apple-silicon
Mixture of Experts
mixture-of-experts
vision-language
gemma
falcon-perception
inference
Model card Files Files and versions
xet
Community
mlx-expert-sniper
417 kB
Ctrl+K
Ctrl+K
  • 2 contributors
History: 45 commits
waltgrace's picture
waltgrace
initial release: deploy code + split scripts
0e41b61 verified 2 days ago
  • mac_tensor
    initial release: deploy code + split scripts 2 days ago
  • src
    Add Gemma 4-26B-A4B support: 4.15 tok/s on M4 Mac Mini 4 days ago
  • .gitattributes
    1.52 kB
    initial commit 11 days ago
  • .gitignore
    51 Bytes
    v0.1.0: MoE expert sniping for MLX โ€” run models larger than your RAM 11 days ago
  • README.md
    7.3 kB
    docs: initial README 2 days ago
  • models_gemma4.py
    22.1 kB
    initial release: deploy code + split scripts 2 days ago
  • pyproject.toml
    573 Bytes
    initial release: deploy code + split scripts 2 days ago
  • setup.cfg
    360 Bytes
    Update setup.cfg 9 days ago
  • setup.py
    398 Bytes
    initial release: deploy code + split scripts 2 days ago
  • split_gemma4.py
    7.48 kB
    initial release: deploy code + split scripts 2 days ago
  • split_qwen.py
    7.74 kB
    initial release: deploy code + split scripts 2 days ago
  • stream_preprocess.py
    8.17 kB
    v0.1.0: MoE expert sniping for MLX โ€” run models larger than your RAM 11 days ago
  • stream_preprocess_35b.py
    8.65 kB
    v0.2.0: Add Qwen3.5-35B-A3B support (5.78 tok/s, 19.5 GB on 16 GB RAM) 10 days ago
  • stream_preprocess_coder.py
    6.87 kB
    v0.3.0: Add Qwen3-Coder-30B + qwen3_5_moe support, thinking filter, 3 models verified 10 days ago