MERaLiON-2-3B-TurboQuant-MLX-2bit

MLX 2-bit TurboQuant quantization of aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B for Apple Silicon inference.

TurboQuant applies mixed-precision quantization that preserves critical attention layers at higher precision while aggressively quantizing less sensitive feed-forward layers, optimizing for speed without sacrificing quality.

Model Specifications

Property Value
Base Model aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B
Parameters ~3B
Architecture Whisper-large-v3 encoder + Gemma-2-2B-IT decoder
Quantization TurboQuant 2-bit (MLX)
Disk Size ~1 GB
Peak RAM ~1.5 GB
License Apache 2.0
Task Automatic Speech Recognition / Speech-to-Text

Quickstart

Installation

pip install mlx-lm mlx-whisper

Inference

from mlx_lm import load, generate
from mlx_lm.cache import TurboQuantCache

model, tokenizer = load("majentik/MERaLiON-2-3B-TurboQuant-MLX-2bit")

# Create TurboQuant-optimized KV cache
cache = TurboQuantCache(model)

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Transcribe the following audio."}],
    tokenize=False,
    add_generation_prompt=True,
)

response = generate(
    model,
    tokenizer,
    prompt=prompt,
    max_tokens=512,
    cache=cache,
)
print(response)

Quantization Details

TurboQuant is a mixed-precision quantization strategy that:

  • Retains attention projection layers at higher precision
  • Quantizes MLP/feed-forward layers more aggressively where precision loss is tolerable
  • Optimizes KV-cache memory layout for faster autoregressive decoding on Apple Silicon

This 2-bit variant offers the smallest possible footprint for the 3B model, enabling speech recognition on extremely memory-constrained Apple Silicon devices. Expect some quality degradation compared to 4-bit and 8-bit variants.

Supported Languages

MERaLiON-2 supports speech recognition in Southeast Asian languages including English, Mandarin Chinese, Malay, Tamil, and Indonesian.

Memory Estimates

Device Feasibility
MacBook Air M1 (8 GB) Comfortable
iPad Pro M1/M2 Comfortable
iPad Air M1 Feasible
iPhone 15 Pro (8 GB) Feasible

See Also

Downloads last month
31
MLX
Hardware compatibility
Log In to add your hardware

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support