MERaLiON-2-3B-RotorQuant-MLX-4bit

MLX 4-bit RotorQuant quantization of aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B for Apple Silicon inference.

RotorQuant applies rotation-based quantization that decorrelates weight matrices before quantization, distributing outlier magnitudes more evenly across channels for improved accuracy at low bit-widths.

Model Specifications

Property Value
Base Model aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B
Parameters ~3B
Architecture Whisper-large-v3 encoder + Gemma-2-2B-IT decoder
Quantization RotorQuant 4-bit (MLX)
Disk Size ~1.5 GB
Peak RAM ~2.5 GB
License Apache 2.0
Task Automatic Speech Recognition / Speech-to-Text

Quickstart

Installation

pip install mlx-lm mlx-whisper

Inference

from mlx_lm import load, generate
from mlx_lm.cache import IsoQuantCache

model, tokenizer = load("majentik/MERaLiON-2-3B-RotorQuant-MLX-4bit")

# Create IsoQuantCache for RotorQuant models
cache = IsoQuantCache(model)

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Transcribe the following audio."}],
    tokenize=False,
    add_generation_prompt=True,
)

response = generate(
    model,
    tokenizer,
    prompt=prompt,
    max_tokens=512,
    cache=cache,
)
print(response)

Quantization Details

RotorQuant is a rotation-based quantization strategy that:

  • Applies learned rotation matrices to decorrelate weight channels before quantization
  • Reduces the impact of outlier weights that typically degrade quantized model quality
  • Provides more uniform weight distributions, leading to better accuracy retention
  • Pairs with IsoQuantCache for consistent KV-cache quantization during inference

This 4-bit variant provides a strong balance between quality and memory usage for the 3B model. The rotation-based approach is particularly effective at 4-bit, where outlier sensitivity is more pronounced.

Supported Languages

MERaLiON-2 supports speech recognition in Southeast Asian languages including English, Mandarin Chinese, Malay, Tamil, and Indonesian.

Memory Estimates

Device Feasibility
MacBook Air M1 (8 GB) Comfortable
MacBook Pro M1/M2 (16 GB) Ideal
iPad Pro M1/M2 Comfortable
iPhone 15 Pro (8 GB) Feasible

See Also

Downloads last month
11
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support