metadata
license: apache-2.0
base_model: HuggingFaceTB/SmolLM-1.7B-Instruct
tags:
- alignment-handbook
- trl
- sft
- mlx
- apple-silicon
- on-device
- tiny-llm
- smollm
- quantized
datasets:
- Magpie-Align/Magpie-Pro-300K-Filtered
- bigcode/self-oss-instruct-sc2-exec-filter-50k
- teknium/OpenHermes-2.5
- HuggingFaceTB/everyday-conversations-llama3.1-2k
library_name: mlx
language:
- en
pipeline_tag: text-generation
SmolLM-1.7B-Instruct (MLX 4-bit)
A 4-bit MLX quantized build of HuggingFaceTB/SmolLM-1.7B-Instruct, optimized for Apple Silicon local inference.
Benchmark Environment
- Device: MacBook Pro (M3 Pro)
- Runtime: MLX
- Quantization: ~4.5 bits per weight
Performance (Measured)
- Disk size: ~922 MB
- Peak memory: ~1.08 GB
- Generation speed: ~110 tokens/sec
Benchmarks were collected on macOS (M3 Pro).
Performance on iPhone / iPad will vary based on hardware and available memory.
Usage
mlx_lm.generate \
--model Irfanuruchi/SmolLM-1.7B-Instruct-MLX-4bit \
--prompt "In 5 sentences, explain the Pomodoro technique and how to start today." \
--max-tokens 140
License
Upstream SmolLM is released under Apache-2.0. Preserve attribution and the original license terms.