SmolLM-360M-Instruct (MLX 3-bit)
An 3-bit MLX quantized build of HuggingFaceTB/SmolLM-360M-Instruct for ultra-low memory usage on Apple Silicon.
Benchmark Environment
- Device: MacBook Pro (M3 Pro)
- Runtime: MLX
- Quantization: ~3.5 bits per weight
Tiny Footprint (Measured)
- Disk size: ~155 MB
- Peak memory: ~0.20 GB
- Generation speed: ~458 tokens/sec (short generation)
These numbers were measured on macOS (M3 Pro).
This is an extreme compression build and may reduce output quality vs 4/5-bit.
Usage
mlx_lm.generate \
--model Irfanuruchi/SmolLM-360M-Instruct-MLX-3bit \
--prompt "Reply with exactly 3 bullet points, 4-8 words each: what can you do offline?" \
--max-tokens 80
License
Upstream SmolLM is released under Apache-2.0. Preserve attribution and the original license terms.
- Downloads last month
- 12
Model size
45.3M params
Tensor type
BF16
·
U32
·
Hardware compatibility
Log In
to view the estimation
3-bit
Model tree for Irfanuruchi/SmolLM-360M-Instruct-MLX-3bit
Base model
HuggingFaceTB/SmolLM-360M
Quantized
HuggingFaceTB/SmolLM-360M-Instruct