SmolLM2-1.7B Instruct (MLX, 4-bit)
This is an MLX conversion of HuggingFaceTB/SmolLM2-1.7B-Instruct quantized to 4-bit for fast on-device inference on Apple Silicon.
Quickstart
Install:
pip install -U mlx-lm
Run:
mlx_lm.generate \
--model Irfanuruchi/SmolLM2-1.7B-Instruct-MLX-4bit \
--prompt "Reply with exactly 3 bullet points, 4–8 words each: what can you do offline?" \
--max-tokens 80
Benchmarks (MacBook Pro M3 Pro)
- Disk: 922 MB
- Peak RAM: 1.093 GB
Performance will vary across devices and prompts.
Notes
- Converted/quantized with
mlx_lm.convert. - This repo contains MLX weights and tokenizer/config files.
License & attribution
Upstream model: HuggingFaceTB/SmolLM2-1.7B-Instruct (Apache-2.0).
Please follow the upstream license and attribution requirements.
- Downloads last month
- 56
Model size
0.3B params
Tensor type
BF16
·
U32
·
Hardware compatibility
Log In
to add your hardware
4-bit
Model tree for Irfanuruchi/SmolLM2-1.7B-Instruct-MLX-4bit
Base model
HuggingFaceTB/SmolLM2-1.7B
Quantized
HuggingFaceTB/SmolLM2-1.7B-Instruct