NeoRoth commited on
Commit
6b8d347
·
verified ·
1 Parent(s): 0894dbb

README: list Q4/Q5/Q6/Q8 and cite MLX community source

Browse files
Files changed (1) hide show
  1. README.md +22 -22
README.md CHANGED
@@ -5,40 +5,40 @@ tags:
5
  - voxtral
6
  - quantized
7
  - mlx
 
8
  library_name: mlx
9
  ---
10
 
11
- # Voxtral 3B — Quantized (MLX)
12
 
13
- Public quantized weights of the upstream model [`mistralai/Voxtral-Mini-3B-2507`](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507). This repo contains MLX-ready variants only.
 
14
 
15
- ## Variants
16
- - MLX Q4: `mlx-q4/`
17
- - MLX Q8: `mlx-q8/`
 
 
18
 
19
- ## Integrity (SHA256)
20
- - MLX Q4 `model-00001-of-00001.safetensors`:
21
- - `eec98aef078b3db2c226943d38558d814b10ec387dc5359d333eeed4be5298d2`
22
- - MLX Q8 `model-00001-of-00001.safetensors`:
23
- - `37999e4a9dda52a0aedb593636be6c12e69dd8b8457f15ce48134f88b1ccebd3`
24
 
25
  ## Quickstart (MLX)
26
  ```python
27
  from mlx_lm import load, generate
28
-
29
- # Load quantized weights (Q4 or Q8 folders are included in the repo)
30
  model, tokenizer = load("NeoRoth/voxtral-3b-quantized")
31
-
32
- prompt = "Hello!"
33
- print(generate(model, tokenizer, prompt, max_tokens=64))
34
  ```
35
 
36
- ## Quantization notes
37
- - Only inference weights are quantized (Q4/Q8 depending on the folder).
38
- - Embeddings are NOT quantized to preserve shape compatibility. As a result, any "bits per weight" metric may exceed the nominal target. This is informational, not an error.
 
 
39
 
40
  ## License
41
- - License: Apache-2.0 (see `LICENSE.txt`). Attribution: upstream model [`mistralai/Voxtral-Mini-3B-2507`](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507).
42
-
43
- ## Issues
44
- If you notice any mismatch (missing files, wrong checksum), please open an issue.
 
5
  - voxtral
6
  - quantized
7
  - mlx
8
+ - voxtral-mini-3b-2507
9
  library_name: mlx
10
  ---
11
 
12
+ # Voxtral Mini 3B — 2507 — Quantized (MLX)
13
 
14
+ Public quantized weights based on MLX bf16 from `mlx-community/Voxtral-Mini-3B-2507-bf16`.
15
+ Upstream model: [`mistralai/Voxtral-Mini-3B-2507`](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507).
16
 
17
+ ## Variants (quantization profiles)
18
+ - Q4: folder `mlx-q4/`
19
+ - Q5: folder `mlx-q5/`
20
+ - Q6: folder `mlx-q6/`
21
+ - Q8: folder `mlx-q8/`
22
 
23
+ Published variants appear as subfolders at the top of this repo when available.
24
+
25
+ ## Quantization notes
26
+ - Only inference weights are quantized (Q4/Q5/Q6/Q8 as above).
27
+ - Embeddings are NOT quantized to preserve shape compatibility. Therefore, any "bits per weight" metric may exceed the nominal target (informational, not an error).
28
 
29
  ## Quickstart (MLX)
30
  ```python
31
  from mlx_lm import load, generate
 
 
32
  model, tokenizer = load("NeoRoth/voxtral-3b-quantized")
33
+ print(generate(model, tokenizer, "Hello!", max_tokens=64))
 
 
34
  ```
35
 
36
+ ## Integrity (SHA256)
37
+ - Q4 `model-00001-of-00001.safetensors`:
38
+ - `eec98aef078b3db2c226943d38558d814b10ec387dc5359d333eeed4be5298d2`
39
+ - Q8 `model-00001-of-00001.safetensors`:
40
+ - `37999e4a9dda52a0aedb593636be6c12e69dd8b8457f15ce48134f88b1ccebd3`
41
 
42
  ## License
43
+ - Apache-2.0 (see `LICENSE.txt`).
44
+ - Credit: MLX base from `mlx-community/Voxtral-Mini-3B-2507-bf16`; upstream model [`mistralai/Voxtral-Mini-3B-2507`](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507).