MERaLiON-3-10B-TurboQuant-MLX-2bit

2-bit weight-quantized MLX version of MERaLiON/MERaLiON-3-10B-preview with TurboQuant KV-cache quantization. Optimized for Apple Silicon inference via the MLX framework.

MERaLiON-3-10B is a multimodal audio-language model built on a Gemma-2 decoder backbone, designed for speech-to-text and audio understanding tasks.

Approximate model size: ~3 GB

Model Specifications

Property Value
Base Model MERaLiON/MERaLiON-3-10B-preview
Parameters ~10 billion
Architecture Multimodal audio-language (Gemma-2 decoder backbone)
Modality Audio + text input, text output
License See base model
Weight Quantization 2-bit (~3 GB)
KV-Cache Quantization TurboQuant
Framework MLX (Apple Silicon)

Quickstart

from mlx_lm import load, generate

model, tokenizer = load("majentik/MERaLiON-3-10B-TurboQuant-MLX-2bit")

prompt = "Transcribe the following audio:"
response = generate(model, tokenizer, prompt=prompt, max_tokens=512)
print(response)

What is TurboQuant?

TurboQuant (arXiv: 2504.19874) is a KV-cache quantization technique that compresses the key-value cache used during autoregressive generation. Combined with 2-bit weight quantization in MLX, this provides aggressive dual compression for running on memory-constrained devices.

Note: 2-bit weight quantization is aggressive and may result in some quality degradation compared to 4-bit or 8-bit variants. Recommended for memory-constrained environments where fitting the model is the priority.

KV-Cache Quantization Comparison

Method Prefill Speed Decode Speed Memory Savings Reference
TurboQuant Baseline Baseline High arXiv: 2504.19874
RotorQuant 5.3x faster 28% faster High GitHub

Memory Estimates (MERaLiON-3-10B)

Precision Approximate Size MLX Variant
FP16 (original) ~20 GB --
8-bit quantized ~10 GB TurboQuant-MLX-8bit
4-bit quantized ~5 GB TurboQuant-MLX-4bit
2-bit quantized ~3 GB This model

Hardware Requirements

This model requires approximately 3 GB of unified memory. Recommended hardware:

  • Apple M1 (8 GB+)
  • Any Apple Silicon Mac

See Also

Downloads last month
145
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for majentik/MERaLiON-3-10B-TurboQuant-MLX-2bit

Finetuned
(8)
this model

Paper for majentik/MERaLiON-3-10B-TurboQuant-MLX-2bit