Mega-ASR β€” MLX (bf16)

MLX port of Mega-ASR for Apple Silicon, for use with mlx-audio.

Mega-ASR is a robustness layer over Qwen3-ASR-1.7B: a tiny audio-quality router classifies each utterance as clean or degraded and switches a dense LoRA adapter in/out of the base weights at inference β€” degraded audio runs the LoRA (robust) path, clean audio runs the unmodified base path. This recovers large WER gains on noisy/far-field speech while leaving clean-speech accuracy essentially unchanged.

The base weights are stored as dense bf16 on purpose: Mega-ASR adds fp32 LoRA deltas to the base at inference, so the base cannot be quantized without losing the runtime router/LoRA switching.

Usage

from mlx_audio.stt import load

model = load("mlx-community/Mega-ASR-MLX-bf16")
result = model.generate("audio.wav", language="en")
print(result.text)

CLI:

python -m mlx_audio.stt.generate --model mlx-community/Mega-ASR-MLX-bf16 --audio audio.wav

The router decides per-utterance automatically; no flags needed.

Validation

Reproduces the paper's published robustness gains. Word Error Rate on the real NOIZEUS corpus (8 noise types Γ— 4 SNR Γ— 30 utterances, Apple Silicon):

SNR base (Qwen3-ASR) Mega-ASR (robust) paper base paper robust
0 dB 23.35 20.61 23.97 19.80
5 dB 8.47 6.51 β€” β€”
10 dB 3.31 2.17 3.41 2.79
15 dB 2.12 0.83 β€” β€”
overall 9.31 7.53 9.45 7.52

Overall robust WER 7.53 vs the paper's 7.52 β€” a ~20% relative reduction over the Qwen3-ASR baseline, reproduced. On clean read speech (FLEURS) the model matches plain Qwen3-ASR, as intended.

License & attribution

Apache-2.0. Built on zhifeixie/Mega-ASR (adapter + router) and Qwen/Qwen3-ASR-1.7B (base).

Downloads last month
-
Safetensors
Model size
2B params
Tensor type
BF16
Β·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for mlx-community/Mega-ASR-MLX-bf16

Finetuned
(58)
this model