LFM2-24B-A2B-mlx-mxfp4

This model is a quantized version of LiquidAI/LFM2-24B-A2B using MLX format with MXFP4 4-bit, group-size 32 quantization. LFM2-24B-A2B is a hybrid language model architecture designed for efficiency.

Conversion Command

The model was converted to MLX format and quantized with the following command, utilizing mlx_lm:

python -m mlx_lm.convert \
    --hf-path LiquidAI/LFM2-24B-A2B \
    --mlx-path LFM2-24B-A2B-mlx-mxfp4 \
    -q \
    --q-group-size 32 \
    --q-bits 4 \
    --q-mode mxfp4

Note: Due to the hybrid MoE architecture and different layer configurations in LFM2, the config.json uses fallback 8-bit quantization with a group size of 64 for specific feed_forward.gate layers where mxfp4 isn't optimal, but the core linear layers are in mxfp4.

Usage with MLX

If you haven't already, please install mlx-lm, mlx-vlm, and necessary dependencies to load this model.

pip install mlx-lm huggingface_hub

Then you can use it in your Python scripts or via the CLI:

from mlx_lm import load, generate

model, tokenizer = load("emberpadgett/LFM2-24B-A2B-mlx-mxfp4")
response = generate(model, tokenizer, prompt="Hello!", max_tokens=100)
print(response)

Downloads last month: 105

Safetensors

Model size

24B params

Tensor type

U32

BF16

F32

MLX

Hardware compatibility

4-bit

Model tree for emberpadgett/LFM2-24B-A2B-mlx-mxfp4

Base model

LiquidAI/LFM2-24B-A2B

Quantized

(23)

this model