MERaLiON-3-10B-RotorQuant-MLX-4bit

4-bit weight-quantized MLX version of MERaLiON/MERaLiON-3-10B-preview with RotorQuant KV-cache quantization. Optimized for Apple Silicon inference via the MLX framework.

MERaLiON-3-10B is a multimodal audio-language model built on a Gemma-2 decoder backbone, designed for speech-to-text and audio understanding tasks.

Approximate model size: ~5 GB

Model Specifications

Property Value
Base Model MERaLiON/MERaLiON-3-10B-preview
Parameters ~10 billion
Architecture Multimodal audio-language (Gemma-2 decoder backbone)
Modality Audio + text input, text output
License See base model
Weight Quantization 4-bit (~5 GB)
KV-Cache Quantization RotorQuant
Framework MLX (Apple Silicon)

Quickstart

from mlx_lm import load, generate

model, tokenizer = load("majentik/MERaLiON-3-10B-RotorQuant-MLX-4bit")

prompt = "Transcribe the following audio:"
response = generate(model, tokenizer, prompt=prompt, max_tokens=512)
print(response)

What is RotorQuant?

RotorQuant is a rotation-based KV cache quantization method that applies learned Clifford algebra rotations before quantizing the key-value cache. Key results:

  • 5.3x faster prefill compared to TurboQuant baseline
  • 28% faster decode throughput
  • Perplexity: 6.91 vs 7.07 for TurboQuant (lower is better)

Combined with MLX 4-bit weight quantization, this dual compression approach provides excellent throughput for audio processing workloads.

KV-Cache Quantization Comparison

Method Prefill Speed Decode Speed Memory Savings Reference
TurboQuant Baseline Baseline High arXiv: 2504.19874
RotorQuant 5.3x faster 28% faster High GitHub

Memory Estimates (MERaLiON-3-10B)

Precision Approximate Size MLX Variant
FP16 (original) ~20 GB --
8-bit quantized ~10 GB RotorQuant-MLX-8bit
4-bit quantized ~5 GB This model
2-bit quantized ~3 GB RotorQuant-MLX-2bit

Hardware Requirements

This model requires approximately 5 GB of unified memory. Recommended hardware:

  • Apple M1 (8 GB+)
  • Any Apple Silicon Mac

See Also

Downloads last month
11
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for majentik/MERaLiON-3-10B-RotorQuant-MLX-4bit

Finetuned
(7)
this model

Paper for majentik/MERaLiON-3-10B-RotorQuant-MLX-4bit