---
base_model: aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B
library_name: mlx
tags:
  - meralion
  - speech
  - audio
  - speech-to-text
  - mlx
  - apple-silicon
  - quantized
  - 2bit
  - rotorquant
  - gemma-2
  - whisper
  - sea-lion
license: apache-2.0
pipeline_tag: automatic-speech-recognition
language:
  - en
  - zh
  - ms
  - ta
  - id
---

# MERaLiON-2-3B-RotorQuant-MLX-2bit

**MLX 2-bit RotorQuant quantization** of [aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B](https://huggingface.co/aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B) for Apple Silicon inference.

RotorQuant applies rotation-based quantization that decorrelates weight matrices before quantization, distributing outlier magnitudes more evenly across channels for improved accuracy at low bit-widths.

## Model Specifications

| Property | Value |
|---|---|
| Base Model | aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B |
| Parameters | ~3B |
| Architecture | Whisper-large-v3 encoder + Gemma-2-2B-IT decoder |
| Quantization | RotorQuant 2-bit (MLX) |
| Disk Size | ~1 GB |
| Peak RAM | ~1.5 GB |
| License | Apache 2.0 |
| Task | Automatic Speech Recognition / Speech-to-Text |

## Quickstart

### Installation

```bash
pip install mlx-lm mlx-whisper
```

### Inference

```python
from mlx_lm import load, generate
from mlx_lm.cache import IsoQuantCache

model, tokenizer = load("majentik/MERaLiON-2-3B-RotorQuant-MLX-2bit")

# Create IsoQuantCache for RotorQuant models
cache = IsoQuantCache(model)

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Transcribe the following audio."}],
    tokenize=False,
    add_generation_prompt=True,
)

response = generate(
    model,
    tokenizer,
    prompt=prompt,
    max_tokens=512,
    cache=cache,
)
print(response)
```

## Quantization Details

RotorQuant is a rotation-based quantization strategy that:
- Applies learned rotation matrices to decorrelate weight channels before quantization
- Reduces the impact of outlier weights that typically degrade quantized model quality
- Provides more uniform weight distributions, leading to better accuracy retention
- Pairs with IsoQuantCache for consistent KV-cache quantization during inference

This 2-bit variant offers the smallest possible footprint for the 3B model. RotorQuant's rotation-based approach is especially valuable at 2-bit, where outlier sensitivity causes the most quality degradation in naive quantization schemes. This makes it the preferred 2-bit option when accuracy matters.

## Supported Languages

MERaLiON-2 supports speech recognition in Southeast Asian languages including English, Mandarin Chinese, Malay, Tamil, and Indonesian.

## Memory Estimates

| Device | Feasibility |
|---|---|
| MacBook Air M1 (8 GB) | Comfortable |
| iPad Pro M1/M2 | Comfortable |
| iPad Air M1 | Feasible |
| iPhone 15 Pro (8 GB) | Feasible |

## See Also

- [majentik/MERaLiON-2-3B-TurboQuant-MLX-2bit](https://huggingface.co/majentik/MERaLiON-2-3B-TurboQuant-MLX-2bit) -- TurboQuant 2-bit variant
- [majentik/MERaLiON-2-3B-RotorQuant-MLX-4bit](https://huggingface.co/majentik/MERaLiON-2-3B-RotorQuant-MLX-4bit) -- RotorQuant 4-bit (higher quality)
- [majentik/MERaLiON-2-3B-RotorQuant-MLX-8bit](https://huggingface.co/majentik/MERaLiON-2-3B-RotorQuant-MLX-8bit) -- RotorQuant 8-bit (highest quality)
- [majentik/MERaLiON-2-10B-RotorQuant-MLX-2bit](https://huggingface.co/majentik/MERaLiON-2-10B-RotorQuant-MLX-2bit) -- 10B RotorQuant 2-bit (larger model)
- [aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B](https://huggingface.co/aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B) -- Original base model