NeoRoth commited on
Commit
abfb85a
·
verified ·
1 Parent(s): e5df03f

Rewrite README in English; add embeddings note

Browse files
Files changed (1) hide show
  1. README.md +21 -11
README.md CHANGED
@@ -8,25 +8,35 @@ library_name: mlx
8
 
9
  # Voxtral 3B — Quantized (MLX)
10
 
11
- Ce dépôt regroupe des variantes quantifiées du modèle Voxtral 3B pour MLX (Apple Silicon).
12
 
13
- ## Variantes
14
- - MLX Q4: dossier `mlx-q4/`
15
- - MLX Q8: dossier `mlx-q8/`
16
 
17
- ## Intégrité (SHA256)
18
  - MLX Q4 `model-00001-of-00001.safetensors`:
19
  - `eec98aef078b3db2c226943d38558d814b10ec387dc5359d333eeed4be5298d2`
20
  - MLX Q8 `model-00001-of-00001.safetensors`:
21
  - `37999e4a9dda52a0aedb593636be6c12e69dd8b8457f15ce48134f88b1ccebd3`
22
 
23
- ## Utilisation rapide (MLX)
24
  ```python
25
- from mlx_lm import load
26
- # Exemple: charger les poids quantifiés MLX Q4
 
27
  model, tokenizer = load("NeoRoth/voxtral-3b-quantized")
 
 
 
28
  ```
29
 
30
- ## Notes
31
- - Ces fichiers sont des quantifications dérivées du modèle Voxtral 3B. Respectez la licence du modèle d’origine.
32
- - Ouvrez une issue si vous repérez un problème (poids manquants, checksum incorrect, etc.).
 
 
 
 
 
 
 
8
 
9
  # Voxtral 3B — Quantized (MLX)
10
 
11
+ Public quantized weights of the Voxtral 3B model for Apple MLX. This repo contains MLX-ready variants only.
12
 
13
+ ## Variants
14
+ - MLX Q4: `mlx-q4/`
15
+ - MLX Q8: `mlx-q8/`
16
 
17
+ ## Integrity (SHA256)
18
  - MLX Q4 `model-00001-of-00001.safetensors`:
19
  - `eec98aef078b3db2c226943d38558d814b10ec387dc5359d333eeed4be5298d2`
20
  - MLX Q8 `model-00001-of-00001.safetensors`:
21
  - `37999e4a9dda52a0aedb593636be6c12e69dd8b8457f15ce48134f88b1ccebd3`
22
 
23
+ ## Quickstart (MLX)
24
  ```python
25
+ from mlx_lm import load, generate
26
+
27
+ # Load quantized weights (Q4 or Q8 folders are included in the repo)
28
  model, tokenizer = load("NeoRoth/voxtral-3b-quantized")
29
+
30
+ prompt = "Hello!"
31
+ print(generate(model, tokenizer, prompt, max_tokens=64))
32
  ```
33
 
34
+ ## Quantization notes
35
+ - Only inference weights are quantized (Q4/Q8 depending on the folder).
36
+ - Embeddings are NOT quantized to preserve shape compatibility. As a result, any "bits per weight" metric may exceed the nominal target. This is informational, not an error.
37
+
38
+ ## License
39
+ - See `LICENSE.txt`. Also ensure you comply with the original Voxtral model license.
40
+
41
+ ## Issues
42
+ If you notice any mismatch (missing files, wrong checksum), please open an issue.