SmolLM-360M-Instruct (MLX 5-bit)

A 5-bit MLX quantized build of HuggingFaceTB/SmolLM-360M-Instruct targeting a better quality/footprint balance than 3-bit.

Benchmark Environment

Device: MacBook Pro (M3 Pro)
Runtime: MLX
Quantization: 5-bit

Performance (Measured)

Disk size: ~241 MB
Peak memory: ~0.29 GB
Generation speed: ~296 tokens/sec

These numbers were measured on macOS (M3 Pro).
iPhone / iPad performance will vary depending on hardware and memory.

Usage

mlx_lm.generate \
  --model Irfanuruchi/SmolLM-360M-Instruct-MLX-5bit \
  --prompt "Reply with exactly 3 bullet points, 4-8 words each: what can you do offline?" \
  --max-tokens 80

License

Upstream SmolLM is released under Apache-2.0. Preserve attribution and the original license terms.

Downloads last month: 1

Safetensors

Model size

67.9M params

Tensor type

BF16

U32

MLX

Hardware compatibility

5-bit

Model tree for Irfanuruchi/SmolLM-360M-Instruct-MLX-5bit

Base model

HuggingFaceTB/SmolLM-360M

Quantized

HuggingFaceTB/SmolLM-360M-Instruct

Quantized

(25)

this model

Irfanuruchi
/

SmolLM-360M-Instruct-MLX-5bit