Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
aufklarer 
posted an update 28 days ago
Post
1438
After running extensive benchmarks across ASR, TTS, and VAD on Apple Silicon, we found some results that weren't documented anywhere.

The most counterintuitive: INT8 runs 3.3x faster than INT4 on the Neural Engine. A 332 MB CoreML model allocates 1,677 MB at runtime. And the right architecture uses both MLX and CoreML simultaneously — not one or the other.

MLX talks to the GPU — programmable, fast for large transformer inference. CoreML talks to the Neural Engine — fixed-function silicon, 135x real-time for small feedforward models like VAD, near-zero power draw.

All benchmarks are from speech-swift, our open-source Swift library for on-device speech AI: ASR, TTS, VAD, diarization, speech-to-speech — everything running locally on Apple Silicon with no API, no cloud, no data leaving the device.

Models on HF: aufklarer/Qwen3-ASR-0.6B-MLX-4bit · aufklarer/parakeet-tdt-0.6b-coreml-int8 · aufklarer/PersonaPlex-7B-MLX-4bit

Full article: https://blog.ivan.digital
Library: https://github.com/soniqo/speech-swift
In this post