Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

waltgrace
/
mlx-expert-sniper

Image-Text-to-Text
MLX
English
apple-silicon
Mixture of Experts
mixture-of-experts
vision-language
gemma
falcon-perception
inference
Model card Files Files and versions
xet
Community

Instructions to use waltgrace/mlx-expert-sniper with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

  • Libraries
  • MLX

    How to use waltgrace/mlx-expert-sniper with MLX:

    # Make sure mlx-vlm is installed
    # pip install --upgrade mlx-vlm
    
    from mlx_vlm import load, generate
    from mlx_vlm.prompt_utils import apply_chat_template
    from mlx_vlm.utils import load_config
    
    # Load the model
    model, processor = load("waltgrace/mlx-expert-sniper")
    config = load_config("waltgrace/mlx-expert-sniper")
    
    # Prepare input
    image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
    prompt = "Describe this image."
    
    # Apply chat template
    formatted_prompt = apply_chat_template(
        processor, config, prompt, num_images=1
    )
    
    # Generate output
    output = generate(model, processor, formatted_prompt, image)
    print(output)
  • Notebooks
  • Google Colab
  • Kaggle
  • Local Apps
  • LM Studio
mlx-expert-sniper
417 kB
Ctrl+K
Ctrl+K
  • 2 contributors
History: 45 commits
waltgrace's picture
waltgrace
initial release: deploy code + split scripts
0e41b61 verified about 2 months ago
  • mac_tensor
    initial release: deploy code + split scripts about 2 months ago
  • src
    Add Gemma 4-26B-A4B support: 4.15 tok/s on M4 Mac Mini about 2 months ago
  • .gitattributes
    1.52 kB
    initial commit about 2 months ago
  • .gitignore
    51 Bytes
    v0.1.0: MoE expert sniping for MLX โ€” run models larger than your RAM about 2 months ago
  • README.md
    7.3 kB
    docs: initial README about 2 months ago
  • models_gemma4.py
    22.1 kB
    initial release: deploy code + split scripts about 2 months ago
  • pyproject.toml
    573 Bytes
    initial release: deploy code + split scripts about 2 months ago
  • setup.cfg
    360 Bytes
    Update setup.cfg about 2 months ago
  • setup.py
    398 Bytes
    initial release: deploy code + split scripts about 2 months ago
  • split_gemma4.py
    7.48 kB
    initial release: deploy code + split scripts about 2 months ago
  • split_qwen.py
    7.74 kB
    initial release: deploy code + split scripts about 2 months ago
  • stream_preprocess.py
    8.17 kB
    v0.1.0: MoE expert sniping for MLX โ€” run models larger than your RAM about 2 months ago
  • stream_preprocess_35b.py
    8.65 kB
    v0.2.0: Add Qwen3.5-35B-A3B support (5.78 tok/s, 19.5 GB on 16 GB RAM) about 2 months ago
  • stream_preprocess_coder.py
    6.87 kB
    v0.3.0: Add Qwen3-Coder-30B + qwen3_5_moe support, thinking filter, 3 models verified about 2 months ago