Irfanuruchi commited on
Commit
5568cb6
·
verified ·
1 Parent(s): 48fbed0

SmolLM2 360M mlx format, quantized to 4 bits

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md CHANGED
@@ -11,3 +11,39 @@ tags:
11
  - mlx
12
  base_model: HuggingFaceTB/SmolLM2-360M-Instruct
13
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  - mlx
12
  base_model: HuggingFaceTB/SmolLM2-360M-Instruct
13
  ---
14
+
15
+ # SmolLM2-360M Instruct (MLX, 4-bit)
16
+
17
+ This is an **MLX** conversion of `HuggingFaceTB/SmolLM2-360M-Instruct` quantized to **4-bit** for fast on-device inference on Apple Silicon.
18
+
19
+ ## Quickstart
20
+
21
+ Install:
22
+ ```bash
23
+ pip install -U mlx-lm
24
+ ```
25
+
26
+ Run:
27
+ ```bash
28
+ mlx_lm.generate \
29
+ --model Irfanuruchi/SmolLM2-360M-Instruct-MLX-4bit \
30
+ --prompt "Reply with exactly 3 bullet points, 4–8 words each: what can you do offline?" \
31
+ --max-tokens 80
32
+ ```
33
+
34
+ ## Benchmarks (MacBook Pro M3 Pro)
35
+
36
+ - Disk: **198 MB**
37
+ - Peak RAM: **0.247 GB**
38
+
39
+ > Performance will vary across devices and prompts.
40
+
41
+ ## Notes
42
+
43
+ - Converted/quantized with `mlx_lm.convert`.
44
+ - This repo contains MLX weights and tokenizer/config files.
45
+
46
+ ## License & attribution
47
+
48
+ Upstream model: `HuggingFaceTB/SmolLM2-360M-Instruct` (Apache-2.0).
49
+ Please follow the upstream license and attribution requirements.