Irfanuruchi commited on
Commit
4668cb6
·
verified ·
1 Parent(s): 03823dc

Added README for SmolLM2_135M instruct quantized to 4 bits

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md CHANGED
@@ -9,5 +9,45 @@ tags:
9
  - onnx
10
  - transformers.js
11
  - mlx
 
 
 
12
  base_model: HuggingFaceTB/SmolLM2-135M-Instruct
13
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - onnx
10
  - transformers.js
11
  - mlx
12
+ - apple-silicon
13
+ - quantized
14
+ - smollm2
15
  base_model: HuggingFaceTB/SmolLM2-135M-Instruct
16
  ---
17
+
18
+ # SmolLM2-135M Instruct (MLX, 4-bit)
19
+
20
+ This is an **MLX** conversion of `HuggingFaceTB/SmolLM2-135M-Instruct` quantized to **4-bit** for fast on-device inference on Apple Silicon.
21
+
22
+ ## Quickstart
23
+
24
+ Install:
25
+ ```bash
26
+ pip install -U mlx-lm
27
+ ```
28
+
29
+ Run:
30
+ ```bash
31
+ mlx_lm.generate \
32
+ --model Irfanuruchi/SmolLM2-135M-Instruct-MLX-4bit \
33
+ --prompt "Reply with exactly 3 bullet points, 4–8 words each: what can you do offline?" \
34
+ --max-tokens 80
35
+ ```
36
+
37
+ ## Benchmarks (MacBook Pro M3 Pro)
38
+
39
+ - Disk: **76 MB**
40
+ - Peak RAM: **0.106 GB**
41
+
42
+ > Performance will vary across devices and prompts.
43
+
44
+ ## Notes
45
+
46
+ - Converted/quantized with `mlx_lm.convert`.
47
+ - This repo contains MLX weights and tokenizer/config files.
48
+
49
+ ## License & attribution
50
+
51
+ Upstream model: `HuggingFaceTB/SmolLM2-135M-Instruct` (Apache-2.0).
52
+ Please follow the upstream license and attribution requirements.
53
+