SmolLM2-1.7B Instruct (MLX, 4-bit)

This is an MLX conversion of HuggingFaceTB/SmolLM2-1.7B-Instruct quantized to 4-bit for fast on-device inference on Apple Silicon.

Quickstart

Install:

pip install -U mlx-lm

Run:

mlx_lm.generate \
  --model Irfanuruchi/SmolLM2-1.7B-Instruct-MLX-4bit \
  --prompt "Reply with exactly 3 bullet points, 4–8 words each: what can you do offline?" \
  --max-tokens 80

Benchmarks (MacBook Pro M3 Pro)

Disk: 922 MB
Peak RAM: 1.093 GB

Performance will vary across devices and prompts.

Notes

Converted/quantized with mlx_lm.convert.
This repo contains MLX weights and tokenizer/config files.

License & attribution

Upstream model: HuggingFaceTB/SmolLM2-1.7B-Instruct (Apache-2.0).
Please follow the upstream license and attribution requirements.

Downloads last month: 20

Safetensors

Model size

0.3B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for Irfanuruchi/SmolLM2-1.7B-Instruct-MLX-4bit

Base model

HuggingFaceTB/SmolLM2-1.7B

Quantized

HuggingFaceTB/SmolLM2-1.7B-Instruct

Quantized

(89)

this model