--- license: apache-2.0 library_name: mlx tags: - mlx - speech-to-text - asr - robust-asr - qwen3-asr base_model: - zhifeixie/Mega-ASR - Qwen/Qwen3-ASR-1.7B language: - en - zh pipeline_tag: automatic-speech-recognition --- # Mega-ASR-8bit 8-bit quantized **robust-merged** variant of [Mega-ASR](https://github.com/xzf-thu/Mega-ASR), in MLX format, for [mlx-audio](https://github.com/Blaizzy/mlx-audio). > **No router — always-on robust.** The Mega-ASR robustness LoRA is **merged** into the Qwen3-ASR-1.7B base and then quantized, so the per-utterance clean/degraded **router is not present** (you cannot add fp32 LoRA deltas to quantized weights). This model always runs the robust path. > > For the **full dynamic Mega-ASR** — clean speech on the base path, noisy speech on the LoRA path — use [`mlx-community/Mega-ASR-bf16`](https://huggingface.co/mlx-community/Mega-ASR-bf16). > > Use this 8-bit variant for **noisy-only / memory-constrained** deployments: ~2.5 GB and ~4× faster than the dynamic model (no per-clip LoRA toggling). For the smallest lossless option, prefer [`mlx-community/Mega-ASR-6bit`](https://huggingface.co/mlx-community/Mega-ASR-6bit). ## Use with mlx-audio ```bash pip install mlx-audio ``` ```python from mlx_audio.stt import load model = load("mlx-community/Mega-ASR-8bit") result = model.generate("audio.wav", language="en") print(result.text) ``` ## Quality 8-bit is effectively **lossless** versus bf16 on noisy speech. WER on a NOIZEUS subset (merged-robust path): | Precision | overall WER | size | |---|---:|---:| | bf16 | 7.95 | 4.08 GB | | 6-bit | 7.89 | 2.04 GB | | **8-bit (this model)** | **8.06** | 2.47 GB | (4-bit degrades to 10.78 WER and is not published.) ## License & attribution Apache-2.0. Built on [zhifeixie/Mega-ASR](https://huggingface.co/zhifeixie/Mega-ASR) (adapter + router) and [Qwen/Qwen3-ASR-1.7B](https://huggingface.co/Qwen/Qwen3-ASR-1.7B) (base).