SmolLM-360M-Instruct (MLX 3-bit)

An 3-bit MLX quantized build of HuggingFaceTB/SmolLM-360M-Instruct for ultra-low memory usage on Apple Silicon.

Benchmark Environment

  • Device: MacBook Pro (M3 Pro)
  • Runtime: MLX
  • Quantization: ~3.5 bits per weight

Tiny Footprint (Measured)

  • Disk size: ~155 MB
  • Peak memory: ~0.20 GB
  • Generation speed: ~458 tokens/sec (short generation)

These numbers were measured on macOS (M3 Pro).
This is an extreme compression build and may reduce output quality vs 4/5-bit.

Usage

mlx_lm.generate \
  --model Irfanuruchi/SmolLM-360M-Instruct-MLX-3bit \
  --prompt "Reply with exactly 3 bullet points, 4-8 words each: what can you do offline?" \
  --max-tokens 80

License

Upstream SmolLM is released under Apache-2.0. Preserve attribution and the original license terms.

Downloads last month
12
Safetensors
Model size
45.3M params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to view the estimation

3-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Irfanuruchi/SmolLM-360M-Instruct-MLX-3bit

Quantized
(23)
this model

Datasets used to train Irfanuruchi/SmolLM-360M-Instruct-MLX-3bit