| --- |
| base_model: aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B |
| library_name: mlx |
| tags: |
| - meralion |
| - speech |
| - audio |
| - speech-to-text |
| - mlx |
| - apple-silicon |
| - quantized |
| - 2bit |
| - rotorquant |
| - gemma-2 |
| - whisper |
| - sea-lion |
| license: apache-2.0 |
| pipeline_tag: automatic-speech-recognition |
| language: |
| - en |
| - zh |
| - ms |
| - ta |
| - id |
| --- |
| |
| # MERaLiON-2-3B-RotorQuant-MLX-2bit |
|
|
| **MLX 2-bit RotorQuant quantization** of [aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B](https://huggingface.co/aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B) for Apple Silicon inference. |
|
|
| RotorQuant applies rotation-based quantization that decorrelates weight matrices before quantization, distributing outlier magnitudes more evenly across channels for improved accuracy at low bit-widths. |
|
|
| ## Model Specifications |
|
|
| | Property | Value | |
| |---|---| |
| | Base Model | aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B | |
| | Parameters | ~3B | |
| | Architecture | Whisper-large-v3 encoder + Gemma-2-2B-IT decoder | |
| | Quantization | RotorQuant 2-bit (MLX) | |
| | Disk Size | ~1 GB | |
| | Peak RAM | ~1.5 GB | |
| | License | Apache 2.0 | |
| | Task | Automatic Speech Recognition / Speech-to-Text | |
|
|
| ## Quickstart |
|
|
| ### Installation |
|
|
| ```bash |
| pip install mlx-lm mlx-whisper |
| ``` |
|
|
| ### Inference |
|
|
| ```python |
| from mlx_lm import load, generate |
| from mlx_lm.cache import IsoQuantCache |
| |
| model, tokenizer = load("majentik/MERaLiON-2-3B-RotorQuant-MLX-2bit") |
| |
| # Create IsoQuantCache for RotorQuant models |
| cache = IsoQuantCache(model) |
| |
| prompt = tokenizer.apply_chat_template( |
| [{"role": "user", "content": "Transcribe the following audio."}], |
| tokenize=False, |
| add_generation_prompt=True, |
| ) |
| |
| response = generate( |
| model, |
| tokenizer, |
| prompt=prompt, |
| max_tokens=512, |
| cache=cache, |
| ) |
| print(response) |
| ``` |
|
|
| ## Quantization Details |
|
|
| RotorQuant is a rotation-based quantization strategy that: |
| - Applies learned rotation matrices to decorrelate weight channels before quantization |
| - Reduces the impact of outlier weights that typically degrade quantized model quality |
| - Provides more uniform weight distributions, leading to better accuracy retention |
| - Pairs with IsoQuantCache for consistent KV-cache quantization during inference |
|
|
| This 2-bit variant offers the smallest possible footprint for the 3B model. RotorQuant's rotation-based approach is especially valuable at 2-bit, where outlier sensitivity causes the most quality degradation in naive quantization schemes. This makes it the preferred 2-bit option when accuracy matters. |
|
|
| ## Supported Languages |
|
|
| MERaLiON-2 supports speech recognition in Southeast Asian languages including English, Mandarin Chinese, Malay, Tamil, and Indonesian. |
|
|
| ## Memory Estimates |
|
|
| | Device | Feasibility | |
| |---|---| |
| | MacBook Air M1 (8 GB) | Comfortable | |
| | iPad Pro M1/M2 | Comfortable | |
| | iPad Air M1 | Feasible | |
| | iPhone 15 Pro (8 GB) | Feasible | |
|
|
| ## See Also |
|
|
| - [majentik/MERaLiON-2-3B-TurboQuant-MLX-2bit](https://huggingface.co/majentik/MERaLiON-2-3B-TurboQuant-MLX-2bit) -- TurboQuant 2-bit variant |
| - [majentik/MERaLiON-2-3B-RotorQuant-MLX-4bit](https://huggingface.co/majentik/MERaLiON-2-3B-RotorQuant-MLX-4bit) -- RotorQuant 4-bit (higher quality) |
| - [majentik/MERaLiON-2-3B-RotorQuant-MLX-8bit](https://huggingface.co/majentik/MERaLiON-2-3B-RotorQuant-MLX-8bit) -- RotorQuant 8-bit (highest quality) |
| - [majentik/MERaLiON-2-10B-RotorQuant-MLX-2bit](https://huggingface.co/majentik/MERaLiON-2-10B-RotorQuant-MLX-2bit) -- 10B RotorQuant 2-bit (larger model) |
| - [aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B](https://huggingface.co/aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B) -- Original base model |
|
|