Mega-ASR-MLX-int8

An int8 affine-quantized MLX build of Mega-ASR, derived from mlx-community/Mega-ASR-MLX-bf16. Produced for the witness native loader (mlx-mega-asr), which loads the packed int8 weights directly โ€” no runtime quantization, so the smaller weights are also the smaller download (~3.8 GB โ†’ ~2.2 GB).

Mega-ASR is a robustness layer over Qwen3-ASR-1.7B: a tiny audio-quality router classifies each utterance as clean or degraded and switches a dense LoRA adapter in/out of the base weights at inference.

What is and isn't quantized

The 344 linear projections of the audio encoder + text decoder (q/k/v/o_proj, fc1/fc2, mlp.{gate,up,down}_proj, conv_out, proj1/proj2, and the tied embed_tokens) are affine-quantized to int8, group_size 64 โ€” each <name>.weight is a packed uint32 tensor plus <name>.scales / .biases, and config.json carries a quantization block.

Everything that must stay precise stays dense bf16: the conv2d subsampling frontend, all layer norms / biases, the per-head q/k norms, and โ€” critically โ€” the router and LoRA adapter in extras/. The runtime applies the fp32 LoRA deltas on top of the dequantized base, so the per-utterance router/LoRA robustness switching is fully preserved. int8 is the deliberate default on Apple Silicon: batch-1 decode is memory-bandwidth-bound, so int8 is faster and ~1.8ร— smaller while staying WER-neutral.

Validation

WER parity vs the bf16 reference is gated by mlx-mega-asr/examples/int8_prepack_parity.rs (LibriSpeech test-clean): the pre-packed int8 path must match runtime int8 transcript-for-transcript and stay within ~0.3% WER of dense bf16. Measured (20 files, Apple Silicon): bf16 1.81% / int8 1.59% WER, 0/20 transcripts differ from runtime int8, RTF 0.043 โ†’ 0.030.

The repo ships vocab.json + merges.txt (the Qwen2 BPE tokenizer is built from them at load โ€” no tokenizer.json).

Downloads last month
23
Safetensors
Model size
0.6B params
Tensor type
BF16
ยท
U32
ยท
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for littlebearlabs/Mega-ASR-MLX-int8

Finetuned
(80)
this model