Mega-ASR — MLX (bf16)

MLX port of Mega-ASR for Apple Silicon, for use with mlx-audio.

Mega-ASR is a robustness layer over Qwen3-ASR-1.7B: a tiny audio-quality router classifies each utterance as clean or degraded and switches a dense LoRA adapter in/out of the base weights at inference — degraded audio runs the LoRA (robust) path, clean audio runs the unmodified base path. This recovers large WER gains on noisy/far-field speech while leaving clean-speech accuracy essentially unchanged.

The base weights are stored as dense bf16 on purpose: Mega-ASR adds fp32 LoRA deltas to the base at inference, so the base cannot be quantized without losing the runtime router/LoRA switching.

Usage

from mlx_audio.stt import load

model = load("mlx-community/Mega-ASR-MLX-bf16")
result = model.generate("audio.wav", language="en")
print(result.text)

CLI:

python -m mlx_audio.stt.generate --model mlx-community/Mega-ASR-MLX-bf16 --audio audio.wav

The router decides per-utterance automatically; no flags needed.

Validation

Reproduces the paper's published robustness gains. Word Error Rate on the real NOIZEUS corpus (8 noise types × 4 SNR × 30 utterances, Apple Silicon):

SNR	base (Qwen3-ASR)	Mega-ASR (robust)	paper base	paper robust
0 dB	23.35	20.61	23.97	19.80
5 dB	8.47	6.51	—	—
10 dB	3.31	2.17	3.41	2.79
15 dB	2.12	0.83	—	—
overall	9.31	7.53	9.45	7.52

Overall robust WER 7.53 vs the paper's 7.52 — a ~20% relative reduction over the Qwen3-ASR baseline, reproduced. On clean read speech (FLEURS) the model matches plain Qwen3-ASR, as intended.

License & attribution

Apache-2.0. Built on zhifeixie/Mega-ASR (adapter + router) and Qwen/Qwen3-ASR-1.7B (base).

Downloads last month: -

Safetensors

Model size

2B params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Model tree for mlx-community/Mega-ASR-MLX-bf16

Base model

Qwen/Qwen3-ASR-1.7B

Finetuned

(58)

this model