Spaces:
Running
Looking for fair Apple Silicon testers for AX Engine, especially Gemma 4 / Qwen on MLX
Hi MLX community,
I’m working on an open-source project called AX Engine:
https://github.com/defai-digital/ax-engine
- it's prefill and direct decode rate already faster than mlx-lm
- it is much stable than any python based
- it is opensource in apache 2.0 -- welcome to fork
======
I don’t want to make this sound like an advertisement. What I’m looking for is a real, fair test from people who already understand MLX and Apple Silicon inference.
We have tested AX Engine internally, but internal tests are not enough. I would like to know how it behaves on different Mac hardware, different memory sizes, and different real-world local LLM setups.
The main things I want to verify are:
- Does the install process actually work smoothly?
- Does the OpenAI-compatible local server behave correctly?
- Are the Gemma 4 / Qwen MLX runtime paths stable?
- Are the benchmark results reproducible outside our own environment?
- Are there cases where
mlx-lm, llama.cpp, LM Studio, or other local runtimes are clearly better? - Are there bugs, bad assumptions, or confusing parts in the documentation?
I’m especially interested in testing with:
- Gemma 4 12B / larger Gemma models
- Qwen 3.x MLX models
- Apple Silicon machines with 32GB, 64GB, 128GB, or more unified memory
- Users who already run local models regularly and can compare against their current setup
If you are willing to test, I would really appreciate honest feedback, including negative results.
Useful information would be:
- Mac model and memory size
- macOS version
- Model tested
- Install method
- Tokens/sec or TTFT if available
- Any crashes, incorrect behavior, or confusing UX
- Comparison with your normal MLX / local inference setup
My goal is not to claim AX Engine is better than existing tools. I want to understand where it works, where it does not work, and what needs to be improved before we promote it more widely.
Any fair test, criticism, or reproducible benchmark result would be very helpful.
try to code a static html page for it and host it in huggingface space, explain why people should switch into it.. explain everything in detailed aspect...
it would be great if it supports https://huggingface.co/ThingAI/Quark-270m-Instruct
ok. we will try to support asap
try to code a static html page for it and host it in huggingface space, explain why people should switch into it.. explain everything in detailed aspect...
one word: faster



