SmolLM-1.7B-Instruct-MLX-4bit / README.md

Irfanuruchi

Update README.md

5ed6e1c verified 3 months ago

preview code

raw

history blame contribute delete

1.23 kB

metadata

license: apache-2.0
base_model: HuggingFaceTB/SmolLM-1.7B-Instruct
tags:
  - alignment-handbook
  - trl
  - sft
  - mlx
  - apple-silicon
  - on-device
  - tiny-llm
  - smollm
  - quantized
datasets:
  - Magpie-Align/Magpie-Pro-300K-Filtered
  - bigcode/self-oss-instruct-sc2-exec-filter-50k
  - teknium/OpenHermes-2.5
  - HuggingFaceTB/everyday-conversations-llama3.1-2k
library_name: mlx
language:
  - en
pipeline_tag: text-generation

SmolLM-1.7B-Instruct (MLX 4-bit)

A 4-bit MLX quantized build of HuggingFaceTB/SmolLM-1.7B-Instruct, optimized for Apple Silicon local inference.

Benchmark Environment

Device: MacBook Pro (M3 Pro)
Runtime: MLX
Quantization: ~4.5 bits per weight

Performance (Measured)

Disk size: ~922 MB
Peak memory: ~1.08 GB
Generation speed: ~110 tokens/sec

Benchmarks were collected on macOS (M3 Pro).
Performance on iPhone / iPad will vary based on hardware and available memory.

Usage

mlx_lm.generate \
  --model Irfanuruchi/SmolLM-1.7B-Instruct-MLX-4bit \
  --prompt "In 5 sentences, explain the Pomodoro technique and how to start today." \
  --max-tokens 140

License

Upstream SmolLM is released under Apache-2.0. Preserve attribution and the original license terms.