MERaLiON-3-10B-MLX-4bit

4-bit MLX port of MERaLiON/MERaLiON-3-10B-preview for Apple Silicon. 4-bit group quantization (group_size=64), no KV-cache compression applied.

MERaLiON-3 is a multimodal audio-language model from I2R, A*STAR (Singapore), built on a Gemma-2 decoder backbone. It targets speech-to-text, speech translation, and audio understanding across English, Mandarin, Malay, Tamil, Indonesian, and other Southeast Asian languages, with particular strength on Singlish and code-switched speech.

This is the raw MLX 4-bit lane — weight quantization only, no KV-cache compression. If you want KV-cache compression stacked on top, see the RotorQuant-MLX-4bit or TurboQuant-MLX-4bit variants.

What's in this repo

Component	Format	Approximate size
Decoder (Gemma-2 backbone)	4-bit MLX (group_size=64, affine)	~5.0 GB
Encoder (Whisper-large-v3)	float16 (unquantized)	~1.2 GB
Speech-text adaptor	float16 (unquantized)	~0.4 GB
Tokenizer	Unchanged from upstream

Approximate total size: ~6.6 GB (about 44% smaller than the 8-bit variant).

Quantized directly from the original full-precision upstream weights (not re-quantized from 8-bit). Encoder and adaptor are kept at float16 since their parameter count is small and quality is sensitive to aggressive quantization on audio feature extraction.

Quickstart

Full multimodal (audio in, text out)

pip install mlx-meralion

from mlx_meralion import load_model, transcribe

model = load_model("majentik/MERaLiON-3-10B-MLX-4bit")

# ASR
text = transcribe(model, "audio.wav")
print(text)

# Translation
text_zh = transcribe(model, "audio.wav", task="translate_zh")

Decoder-only (text generation)

The 4-bit Gemma-2 decoder can be loaded standalone with mlx-lm:

from mlx_lm import load, generate

model, tokenizer = load("majentik/MERaLiON-3-10B-MLX-4bit/decoder")
out = generate(model, tokenizer, prompt="Hello", max_tokens=128)
print(out)

Size comparison

Precision	Approximate size	Variant
FP16 (upstream)	~20 GB	MERaLiON/MERaLiON-3-10B-preview
MLX 8-bit	~11.6 GB	majentik/MERaLiON-3-10B-MLX
MLX 4-bit (this repo)	~6.6 GB	—

What's not in this repo

Training data. Quantization-only release.
Benchmarks. Expect mild WER drift on 4-bit vs. 8-bit upstream; formal measurement is pending.
Safety alignment changes. Inherited from upstream.

Known limitations

Inherits upstream's out-of-scope warning: not intended for tool-calling, math, or coding tasks.
At 4-bit, some multilingual tail languages (Tamil, Thai, Vietnamese) may see larger WER regressions than English/Mandarin. If quality matters more than memory, prefer the 8-bit variant.
Audio >30s is chunked automatically by mlx-meralion; boundary artifacts are possible.

Hardware requirements

Apple Silicon (M1/M2/M3/M4), macOS
~8 GB unified memory recommended (runs comfortably on a base M1/M2 MacBook Air)

License

Released under the MERaLiON Public Licence v3, inherited from the upstream model. See the license PDF.

Model tree for majentik/MERaLiON-3-10B-MLX-4bit

Base model

google/gemma-2-9b

Finetuned

google/gemma-2-9b-it

Finetuned

MERaLiON/MERaLiON-3-10B-preview

Finetuned

(8)

this model

majentik
/

MERaLiON-3-10B-MLX-4bit

MERaLiON-3-10B-MLX-4bit

What's in this repo

Quickstart

Full multimodal (audio in, text out)

Decoder-only (text generation)

Size comparison

What's not in this repo

Known limitations

Hardware requirements

License

Links

Model tree for majentik/MERaLiON-3-10B-MLX-4bit