NeoRoth
/

voxtral-3b-quantized

@@ -5,40 +5,40 @@ tags:
   - voxtral
   - quantized
   - mlx
 library_name: mlx
 ---
-# Voxtral 3B — Quantized (MLX)
-Public quantized weights of the upstream model [`mistralai/Voxtral-Mini-3B-2507`](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507). This repo contains MLX-ready variants only.
-## Variants
-- MLX Q4: `mlx-q4/`
-- MLX Q8: `mlx-q8/`
-## Integrity (SHA256)
-- MLX Q4 `model-00001-of-00001.safetensors`:
-  - `eec98aef078b3db2c226943d38558d814b10ec387dc5359d333eeed4be5298d2`
-- MLX Q8 `model-00001-of-00001.safetensors`:
-  - `37999e4a9dda52a0aedb593636be6c12e69dd8b8457f15ce48134f88b1ccebd3`
 ## Quickstart (MLX)
 ```python
 from mlx_lm import load, generate
-# Load quantized weights (Q4 or Q8 folders are included in the repo)
 model, tokenizer = load("NeoRoth/voxtral-3b-quantized")
-prompt = "Hello!"
-print(generate(model, tokenizer, prompt, max_tokens=64))
 ```
-## Quantization notes
-- Only inference weights are quantized (Q4/Q8 depending on the folder).
-- Embeddings are NOT quantized to preserve shape compatibility. As a result, any "bits per weight" metric may exceed the nominal target. This is informational, not an error.
 ## License
-- License: Apache-2.0 (see `LICENSE.txt`). Attribution: upstream model [`mistralai/Voxtral-Mini-3B-2507`](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507).
-## Issues
-If you notice any mismatch (missing files, wrong checksum), please open an issue.

   - voxtral
   - quantized
   - mlx
+  - voxtral-mini-3b-2507
 library_name: mlx
 ---
+# Voxtral Mini 3B — 2507 — Quantized (MLX)
+Public quantized weights based on MLX bf16 from `mlx-community/Voxtral-Mini-3B-2507-bf16`.
+Upstream model: [`mistralai/Voxtral-Mini-3B-2507`](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507).
+## Variants (quantization profiles)
+- Q4: folder `mlx-q4/`
+- Q5: folder `mlx-q5/`
+- Q6: folder `mlx-q6/`
+- Q8: folder `mlx-q8/`
+Published variants appear as subfolders at the top of this repo when available.
+## Quantization notes
+- Only inference weights are quantized (Q4/Q5/Q6/Q8 as above).
+- Embeddings are NOT quantized to preserve shape compatibility. Therefore, any "bits per weight" metric may exceed the nominal target (informational, not an error).
 ## Quickstart (MLX)
 ```python
 from mlx_lm import load, generate
 model, tokenizer = load("NeoRoth/voxtral-3b-quantized")
+print(generate(model, tokenizer, "Hello!", max_tokens=64))
 ```
+## Integrity (SHA256)
+- Q4 `model-00001-of-00001.safetensors`:
+  - `eec98aef078b3db2c226943d38558d814b10ec387dc5359d333eeed4be5298d2`
+- Q8 `model-00001-of-00001.safetensors`:
+  - `37999e4a9dda52a0aedb593636be6c12e69dd8b8457f15ce48134f88b1ccebd3`
 ## License
+- Apache-2.0 (see `LICENSE.txt`).
+- Credit: MLX base from `mlx-community/Voxtral-Mini-3B-2507-bf16`; upstream model [`mistralai/Voxtral-Mini-3B-2507`](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507).