Spaces:

mlx-community
/

README

Running

App Files Files Community

Looking for fair Apple Silicon testers for AX Engine, especially Gemma 4 / Qwen on MLX

#35

by ak959 - opened 18 days ago

Discussion

ak959

18 days ago

•

edited 18 days ago

Hi MLX community,

I’m working on an open-source project called AX Engine:

https://github.com/defai-digital/ax-engine

it's prefill and direct decode rate already faster than mlx-lm
it is much stable than any python based
it is opensource in apache 2.0 -- welcome to fork

======

I don’t want to make this sound like an advertisement. What I’m looking for is a real, fair test from people who already understand MLX and Apple Silicon inference.

We have tested AX Engine internally, but internal tests are not enough. I would like to know how it behaves on different Mac hardware, different memory sizes, and different real-world local LLM setups.

The main things I want to verify are:

Does the install process actually work smoothly?
Does the OpenAI-compatible local server behave correctly?
Are the Gemma 4 / Qwen MLX runtime paths stable?
Are the benchmark results reproducible outside our own environment?
Are there cases where mlx-lm, llama.cpp, LM Studio, or other local runtimes are clearly better?
Are there bugs, bad assumptions, or confusing parts in the documentation?

I’m especially interested in testing with:

Gemma 4 12B / larger Gemma models
Qwen 3.x MLX models
Apple Silicon machines with 32GB, 64GB, 128GB, or more unified memory
Users who already run local models regularly and can compare against their current setup

If you are willing to test, I would really appreciate honest feedback, including negative results.

Useful information would be:

Mac model and memory size
macOS version
Model tested
Install method
Tokens/sec or TTFT if available
Any crashes, incorrect behavior, or confusing UX
Comparison with your normal MLX / local inference setup

My goal is not to claim AX Engine is better than existing tools. I want to understand where it works, where it does not work, and what needs to be improved before we promote it more widely.

Any fair test, criticism, or reproducible benchmark result would be very helpful.

ak959

18 days ago

•

edited 18 days ago

Qwen3.6-27B-4bit direct raw decode (no mtp or n-gram)

ak959

18 days ago

•

edited 18 days ago

Qwen3.6-35B-A3B-4bit direct raw decode (no mtp or n-gram)

ak959

18 days ago

gemma-4-26b-a4b-it-4bit direct raw decode (no mtp or n-gram)

ak959

18 days ago

gemma-4-31b-it-4bit direct raw decode (no mtp or n-gram)

usermma

18 days ago

it would be great if it supports https://huggingface.co/ThingAI/Quark-270m-Instruct

usermma

18 days ago

try to code a static html page for it and host it in huggingface space, explain why people should switch into it.. explain everything in detailed aspect...

ak959

17 days ago

it would be great if it supports https://huggingface.co/ThingAI/Quark-270m-Instruct

ok. we will try to support asap

ak959

17 days ago

try to code a static html page for it and host it in huggingface space, explain why people should switch into it.. explain everything in detailed aspect...

one word: faster

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment