LFM2-24B-A2B-mlx-mxfp4
This model is a quantized version of LiquidAI/LFM2-24B-A2B using MLX format with MXFP4 4-bit, group-size 32 quantization. LFM2-24B-A2B is a hybrid language model architecture designed for efficiency.
Conversion Command
The model was converted to MLX format and quantized with the following command, utilizing mlx_lm:
python -m mlx_lm.convert \
--hf-path LiquidAI/LFM2-24B-A2B \
--mlx-path LFM2-24B-A2B-mlx-mxfp4 \
-q \
--q-group-size 32 \
--q-bits 4 \
--q-mode mxfp4
Note: Due to the hybrid MoE architecture and different layer configurations in LFM2, the
config.jsonuses fallback 8-bit quantization with a group size of 64 for specificfeed_forward.gatelayers wheremxfp4isn't optimal, but the core linear layers are inmxfp4.
Usage with MLX
If you haven't already, please install mlx-lm, mlx-vlm, and necessary dependencies to load this model.
pip install mlx-lm huggingface_hub
Then you can use it in your Python scripts or via the CLI:
from mlx_lm import load, generate
model, tokenizer = load("emberpadgett/LFM2-24B-A2B-mlx-mxfp4")
response = generate(model, tokenizer, prompt="Hello!", max_tokens=100)
print(response)
- Downloads last month
- 105
Model size
24B params
Tensor type
U8
路
U32 路
BF16 路
F32 路
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for emberpadgett/LFM2-24B-A2B-mlx-mxfp4
Base model
LiquidAI/LFM2-24B-A2B