MERaLiON-2-10B-RotorQuant-MLX-4bit

MLX 4-bit RotorQuant quantization of aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-10B for Apple Silicon inference.

RotorQuant applies rotation-based quantization that decorrelates weight matrices before quantization, distributing outlier magnitudes more evenly across channels for improved accuracy at low bit-widths.

Model Specifications

Property Value
Base Model aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-10B
Parameters ~10B
Architecture Whisper encoder + Gemma-2-9B-IT decoder
Quantization RotorQuant 4-bit (MLX)
Disk Size ~5 GB
Peak RAM ~6 GB
License Apache 2.0
Task Automatic Speech Recognition / Speech-to-Text

Quickstart

Installation

pip install mlx-lm mlx-whisper

Inference

from mlx_lm import load, generate
from mlx_lm.cache import IsoQuantCache

model, tokenizer = load("majentik/MERaLiON-2-10B-RotorQuant-MLX-4bit")

# Create IsoQuantCache for RotorQuant models
cache = IsoQuantCache(model)

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Transcribe the following audio."}],
    tokenize=False,
    add_generation_prompt=True,
)

response = generate(
    model,
    tokenizer,
    prompt=prompt,
    max_tokens=512,
    cache=cache,
)
print(response)

Quantization Details

RotorQuant is a rotation-based quantization strategy that:

  • Applies learned rotation matrices to decorrelate weight channels before quantization
  • Reduces the impact of outlier weights that typically degrade quantized model quality
  • Provides more uniform weight distributions, leading to better accuracy retention
  • Pairs with IsoQuantCache for consistent KV-cache quantization during inference

This 4-bit variant provides a strong balance between quality and memory usage. The rotation-based approach is particularly effective at 4-bit, where outlier sensitivity is more pronounced.

Supported Languages

MERaLiON-2 supports speech recognition in Southeast Asian languages including English, Mandarin Chinese, Malay, Tamil, and Indonesian.

Memory Estimates

Device Feasibility
MacBook Air M1 (8 GB) Feasible with limited headroom
MacBook Pro M1/M2 (16 GB) Comfortable
MacBook Pro M2/M3 (32 GB) Recommended
Mac Studio M2 Ultra (64 GB+) Ideal for production

See Also

Downloads last month
8
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support