--- base_model: aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B library_name: mlx tags: - meralion - speech - audio - speech-to-text - mlx - apple-silicon - quantized - 2bit - rotorquant - gemma-2 - whisper - sea-lion license: apache-2.0 pipeline_tag: automatic-speech-recognition language: - en - zh - ms - ta - id --- # MERaLiON-2-3B-RotorQuant-MLX-2bit **MLX 2-bit RotorQuant quantization** of [aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B](https://huggingface.co/aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B) for Apple Silicon inference. RotorQuant applies rotation-based quantization that decorrelates weight matrices before quantization, distributing outlier magnitudes more evenly across channels for improved accuracy at low bit-widths. ## Model Specifications | Property | Value | |---|---| | Base Model | aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B | | Parameters | ~3B | | Architecture | Whisper-large-v3 encoder + Gemma-2-2B-IT decoder | | Quantization | RotorQuant 2-bit (MLX) | | Disk Size | ~1 GB | | Peak RAM | ~1.5 GB | | License | Apache 2.0 | | Task | Automatic Speech Recognition / Speech-to-Text | ## Quickstart ### Installation ```bash pip install mlx-lm mlx-whisper ``` ### Inference ```python from mlx_lm import load, generate from mlx_lm.cache import IsoQuantCache model, tokenizer = load("majentik/MERaLiON-2-3B-RotorQuant-MLX-2bit") # Create IsoQuantCache for RotorQuant models cache = IsoQuantCache(model) prompt = tokenizer.apply_chat_template( [{"role": "user", "content": "Transcribe the following audio."}], tokenize=False, add_generation_prompt=True, ) response = generate( model, tokenizer, prompt=prompt, max_tokens=512, cache=cache, ) print(response) ``` ## Quantization Details RotorQuant is a rotation-based quantization strategy that: - Applies learned rotation matrices to decorrelate weight channels before quantization - Reduces the impact of outlier weights that typically degrade quantized model quality - Provides more uniform weight distributions, leading to better accuracy retention - Pairs with IsoQuantCache for consistent KV-cache quantization during inference This 2-bit variant offers the smallest possible footprint for the 3B model. RotorQuant's rotation-based approach is especially valuable at 2-bit, where outlier sensitivity causes the most quality degradation in naive quantization schemes. This makes it the preferred 2-bit option when accuracy matters. ## Supported Languages MERaLiON-2 supports speech recognition in Southeast Asian languages including English, Mandarin Chinese, Malay, Tamil, and Indonesian. ## Memory Estimates | Device | Feasibility | |---|---| | MacBook Air M1 (8 GB) | Comfortable | | iPad Pro M1/M2 | Comfortable | | iPad Air M1 | Feasible | | iPhone 15 Pro (8 GB) | Feasible | ## See Also - [majentik/MERaLiON-2-3B-TurboQuant-MLX-2bit](https://huggingface.co/majentik/MERaLiON-2-3B-TurboQuant-MLX-2bit) -- TurboQuant 2-bit variant - [majentik/MERaLiON-2-3B-RotorQuant-MLX-4bit](https://huggingface.co/majentik/MERaLiON-2-3B-RotorQuant-MLX-4bit) -- RotorQuant 4-bit (higher quality) - [majentik/MERaLiON-2-3B-RotorQuant-MLX-8bit](https://huggingface.co/majentik/MERaLiON-2-3B-RotorQuant-MLX-8bit) -- RotorQuant 8-bit (highest quality) - [majentik/MERaLiON-2-10B-RotorQuant-MLX-2bit](https://huggingface.co/majentik/MERaLiON-2-10B-RotorQuant-MLX-2bit) -- 10B RotorQuant 2-bit (larger model) - [aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B](https://huggingface.co/aisingapore/MERaLiON-AudioLLM-Whisper-SEA-LION-V3-3B) -- Original base model