Upload LFM2.5-350M 4bit MLX quantization

35111d8 verified 9 days ago

1.73 kB

library_name: mlx
license: other
license_name: lfm1.0
license_link: LICENSE
language:
  - en
  - ar
  - zh
  - fr
  - de
  - ja
  - ko
  - es
  - pt
pipeline_tag: text-generation
tags:
  - liquid
  - lfm2.5
  - edge
  - mlx
  - quantized
  - apple-silicon
  - 4bit
base_model: LiquidAI/LFM2.5-350M

LFM2.5-350M-4bit (MLX)

4-bit quantized MLX conversion of LiquidAI/LFM2.5-350M.

Converted with mlx-lm==0.31.2 using the standard quantization method.

Quantization


Method	Standard (weight-only RTN)
Bits	4
Group size	64
Effective bits/weight	4.502

Quality

Perplexity on allenai/tulu-3-sft-mixture (256 samples, seq_len=512) via mlx_lm.perplexity:

Model	Perplexity	Δ vs bf16
`LiquidAI/LFM2.5-350M` (bf16)	118.70 ± 1.69	—
This (4-bit)	180.60 ± 2.66	+52%

Note: Sub-1B models are more sensitive to low-bit quantization. If quality matters more than size, consider the official LiquidAI/LFM2.5-350M-MLX-6bit / -8bit variants, or a future DWQ/AWQ build.

Performance

Benchmarked with mlx_lm.benchmark -p 512 -g 128 on Apple M4 Pro, 48GB:

Metric	Value
Prefill	9,470 tok/s
Generation	676 tok/s
Peak memory	465 MB
Size on disk	195 MB

Usage

from mlx_lm import load, generate

model, tokenizer = load("BRlin/LFM2.5-350M-4bit")
response = generate(model, tokenizer, prompt="Hello", max_tokens=100)
print(response)

Or via CLI:

mlx_lm.generate --model BRlin/LFM2.5-350M-4bit --prompt "Hello"

License

Inherits the LFM Open License v1.0 from the base model.