LFM2-24B-A2B-mlx-mxfp4

This model is a quantized version of LiquidAI/LFM2-24B-A2B using MLX format with MXFP4 4-bit, group-size 32 quantization. LFM2-24B-A2B is a hybrid language model architecture designed for efficiency.

Conversion Command

The model was converted to MLX format and quantized with the following command, utilizing mlx_lm:

python -m mlx_lm.convert \
    --hf-path LiquidAI/LFM2-24B-A2B \
    --mlx-path LFM2-24B-A2B-mlx-mxfp4 \
    -q \
    --q-group-size 32 \
    --q-bits 4 \
    --q-mode mxfp4

Note: Due to the hybrid MoE architecture and different layer configurations in LFM2, the config.json uses fallback 8-bit quantization with a group size of 64 for specific feed_forward.gate layers where mxfp4 isn't optimal, but the core linear layers are in mxfp4.

Usage with MLX

If you haven't already, please install mlx-lm, mlx-vlm, and necessary dependencies to load this model.

pip install mlx-lm huggingface_hub

Then you can use it in your Python scripts or via the CLI:

from mlx_lm import load, generate

model, tokenizer = load("emberpadgett/LFM2-24B-A2B-mlx-mxfp4")
response = generate(model, tokenizer, prompt="Hello!", max_tokens=100)
print(response)
Downloads last month
105
Safetensors
Model size
24B params
Tensor type
U8
U32
BF16
F32
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for emberpadgett/LFM2-24B-A2B-mlx-mxfp4

Quantized
(23)
this model