Irfanuruchi
/

SmolLM2-135M-Instruct-MLX-4bit

 - onnx
 - transformers.js
 - mlx
+- apple-silicon
+- quantized
+- smollm2
 base_model: HuggingFaceTB/SmolLM2-135M-Instruct
 ---
+# SmolLM2-135M Instruct (MLX, 4-bit)
+This is an **MLX** conversion of `HuggingFaceTB/SmolLM2-135M-Instruct` quantized to **4-bit** for fast on-device inference on Apple Silicon.
+## Quickstart
+Install:
+```bash
+pip install -U mlx-lm
+```
+Run:
+```bash
+mlx_lm.generate \
+  --model Irfanuruchi/SmolLM2-135M-Instruct-MLX-4bit \
+  --prompt "Reply with exactly 3 bullet points, 4–8 words each: what can you do offline?" \
+  --max-tokens 80
+```
+## Benchmarks (MacBook Pro M3 Pro)
+- Disk: **76 MB**
+- Peak RAM: **0.106 GB**
+> Performance will vary across devices and prompts.
+## Notes
+- Converted/quantized with `mlx_lm.convert`.
+- This repo contains MLX weights and tokenizer/config files.
+## License & attribution
+Upstream model: `HuggingFaceTB/SmolLM2-135M-Instruct` (Apache-2.0).
+Please follow the upstream license and attribution requirements.