LFM2.5-350M-4bit (MLX)

4-bit quantized MLX conversion of LiquidAI/LFM2.5-350M.

Converted with mlx-lm==0.31.2 using the standard quantization method.

Quantization


Method	Standard (weight-only RTN)
Bits	4
Group size	64
Effective bits/weight	4.502

Quality

Perplexity on allenai/tulu-3-sft-mixture (256 samples, seq_len=512) via mlx_lm.perplexity:

Model	Perplexity	Δ vs bf16
`LiquidAI/LFM2.5-350M` (bf16)	118.70 ± 1.69	—
This (4-bit)	180.60 ± 2.66	+52%

Note: Sub-1B models are more sensitive to low-bit quantization. If quality matters more than size, consider the official LiquidAI/LFM2.5-350M-MLX-6bit / -8bit variants, or a future DWQ/AWQ build.

Performance

Benchmarked with mlx_lm.benchmark -p 512 -g 128 on Apple M4 Pro, 48GB:

Metric	Value
Prefill	9,470 tok/s
Generation	676 tok/s
Peak memory	465 MB
Size on disk	195 MB

Usage

from mlx_lm import load, generate

model, tokenizer = load("BRlin/LFM2.5-350M-4bit")
response = generate(model, tokenizer, prompt="Hello", max_tokens=100)
print(response)

Or via CLI:

mlx_lm.generate --model BRlin/LFM2.5-350M-4bit --prompt "Hello"

License

Inherits the LFM Open License v1.0 from the base model.

Downloads last month: 26

Safetensors

Model size

55.4M params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for BRlin/LFM2.5-350M-4bit

Base model

LiquidAI/LFM2.5-350M-Base

Finetuned

LiquidAI/LFM2.5-350M

Quantized

(33)

this model