MERaLiON-2-10B-RotorQuant-MLX-2bit
MLX 2-bit RotorQuant quantization of aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-10B for Apple Silicon inference.
RotorQuant applies rotation-based quantization that decorrelates weight matrices before quantization, distributing outlier magnitudes more evenly across channels for improved accuracy at low bit-widths.
Model Specifications
| Property | Value |
|---|---|
| Base Model | aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-10B |
| Parameters | ~10B |
| Architecture | Whisper encoder + Gemma-2-9B-IT decoder |
| Quantization | RotorQuant 2-bit (MLX) |
| Disk Size | ~3 GB |
| Peak RAM | ~4 GB |
| License | Apache 2.0 |
| Task | Automatic Speech Recognition / Speech-to-Text |
Quickstart
Installation
pip install mlx-lm mlx-whisper
Inference
from mlx_lm import load, generate
from mlx_lm.cache import IsoQuantCache
model, tokenizer = load("majentik/MERaLiON-2-10B-RotorQuant-MLX-2bit")
# Create IsoQuantCache for RotorQuant models
cache = IsoQuantCache(model)
prompt = tokenizer.apply_chat_template(
[{"role": "user", "content": "Transcribe the following audio."}],
tokenize=False,
add_generation_prompt=True,
)
response = generate(
model,
tokenizer,
prompt=prompt,
max_tokens=512,
cache=cache,
)
print(response)
Quantization Details
RotorQuant is a rotation-based quantization strategy that:
- Applies learned rotation matrices to decorrelate weight channels before quantization
- Reduces the impact of outlier weights that typically degrade quantized model quality
- Provides more uniform weight distributions, leading to better accuracy retention
- Pairs with IsoQuantCache for consistent KV-cache quantization during inference
This 2-bit variant offers the smallest memory footprint. RotorQuant's rotation-based approach is especially valuable at 2-bit, where outlier sensitivity causes the most quality degradation in naive quantization schemes.
Supported Languages
MERaLiON-2 supports speech recognition in Southeast Asian languages including English, Mandarin Chinese, Malay, Tamil, and Indonesian.
Memory Estimates
| Device | Feasibility |
|---|---|
| MacBook Air M1 (8 GB) | Feasible |
| MacBook Pro M1/M2 (16 GB) | Comfortable |
| MacBook Pro M2/M3 (32 GB) | Ideal |
| Mac Studio M2 Ultra (64 GB+) | Overkill for this variant |
See Also
- majentik/MERaLiON-2-10B-TurboQuant-MLX-2bit -- TurboQuant 2-bit variant
- majentik/MERaLiON-2-10B-RotorQuant-MLX-4bit -- RotorQuant 4-bit (higher quality)
- majentik/MERaLiON-2-10B-RotorQuant-MLX-8bit -- RotorQuant 8-bit (highest quality)
- aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-10B -- Original base model
- Downloads last month
- 12
Hardware compatibility
Log In to add your hardware
2-bit