Add MLX quantized model

Browse files

Files changed (5) hide show

.gitattributes +1 -0
README.md +77 -0
config.json +52 -0
model.safetensors +3 -0
tekken.json +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tekken.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,77 @@

+---
+base_model: mistralai/Voxtral-Mini-4B-Realtime-2602
+library_name: mlx
+license: apache-2.0
+pipeline_tag: automatic-speech-recognition
+tags:
+  - voxtral
+  - audio
+  - speech
+  - speech-recognition
+  - realtime
+  - streaming
+  - asr
+  - mlx
+  - rotorquant
+  - quantization
+  - 2-bit
+---
+# Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-2bit
+2-bit MLX weight-quantized build of [`mistralai/Voxtral-Mini-4B-Realtime-2602`](https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602) with RotorQuant KV-cache. Ultra-compact real-time ASR for memory-constrained Apple Silicon — best-available 2-bit stability on streaming audio.
+## Overview
+- **Base:** `mistralai/Voxtral-Mini-4B-Realtime-2602` — 4B real-time ASR model
+- **Weight precision:** 2-bit (group-wise)
+- **KV-cache profile:** RotorQuant
+- **Approx. on-disk size:** ~1.2 GB
+- **Runtime:** MLX on Apple Silicon
+## Quickstart
+```bash
+pip install mlx-lm
+```
+```python
+from mlx_lm import load, generate
+model, tokenizer = load("majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-2bit")
+for chunk in audio_stream():
+    prompt = tokenizer.apply_chat_template(
+        [{"role": "user", "content": [{"type": "audio", "path": chunk}]}],
+        add_generation_prompt=True,
+    )
+    emit(generate(model, tokenizer, prompt=prompt, max_tokens=32))
+```
+## Model specs
+| Field | Value |
+|---|---|
+| Parameters | 4B |
+| Weight bits | 2 |
+| Group size | 32 |
+| Cache profile | RotorQuant |
+| Size on disk | ~1.2 GB |
+| Target hardware | Apple Silicon (M1/M2/M3/M4) |
+| License | Apache 2.0 |
+## RotorQuant vs TurboQuant
+| | RotorQuant | TurboQuant |
+|---|---|---|
+| Strategy | Rotational online re-basis | Per-head static calibration |
+| Memory reduction | ~4x on KV-cache | ~3.5x on KV-cache |
+| Best for | Noisy/multi-speaker streams | Predictable domains, lowest p50 latency |
+## See also
+- [`majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-4bit`](https://huggingface.co/majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-4bit)
+- [`majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-8bit`](https://huggingface.co/majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-8bit)
+- [`majentik/Voxtral-Mini-4B-Realtime-2602-TurboQuant-MLX-2bit`](https://huggingface.co/majentik/Voxtral-Mini-4B-Realtime-2602-TurboQuant-MLX-2bit)
+- [`majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant`](https://huggingface.co/majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant) — KV-cache-only bundle
+- [`mistralai/Voxtral-Mini-4B-Realtime-2602`](https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602) — upstream base model

config.json ADDED Viewed

	@@ -0,0 +1,52 @@

+{
+  "model_type": "voxtral_realtime",
+  "decoder": {
+    "dim": 3072,
+    "n_layers": 26,
+    "head_dim": 128,
+    "hidden_dim": 9216,
+    "n_heads": 32,
+    "n_kv_heads": 8,
+    "vocab_size": 131072,
+    "norm_eps": 1e-05,
+    "rope_theta": 1000000.0,
+    "sliding_window": 8192,
+    "tied_embeddings": true,
+    "ada_rms_norm_t_cond": true,
+    "ada_rms_norm_t_cond_dim": 32
+  },
+  "encoder_args": {
+    "audio_encoding_args": {
+      "sampling_rate": 16000,
+      "frame_rate": 12.5,
+      "num_mel_bins": 128,
+      "hop_length": 160,
+      "window_size": 400,
+      "chunk_length_s": null,
+      "global_log_mel_max": 1.5,
+      "transcription_format": "streaming"
+    },
+    "dim": 1280,
+    "n_layers": 32,
+    "head_dim": 64,
+    "hidden_dim": 5120,
+    "n_heads": 32,
+    "vocab_size": 131072,
+    "n_kv_heads": 32,
+    "use_biases": true,
+    "use_cache": false,
+    "rope_theta": 1000000.0,
+    "causal": true,
+    "norm_eps": 1e-05,
+    "pos_embed": "rope",
+    "max_source_positions": null,
+    "ffn_type": "swiglu",
+    "norm_type": "rms_norm",
+    "sliding_window": 750,
+    "downsample_factor": 4
+  },
+  "quantization_config": {
+    "bits": 2,
+    "group_size": 64
+  }
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aa1c4777e2d71e4db1b3542ad16494f443779f0a79a00219d0fd6d6bdb691c0b
+size 1395968235

tekken.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8434af1d39eba99f0ef46cf1450bf1a63fa941a26933a1ef5dbbf4adf0d00e44
+size 14910348