Note: for the ExecuWhisper-specific fine-tuned formatter, see younghan-meta/LFM2.5-350M-ExecuWhisper-Formatter. This repo continues to host the upstream-base LFM2.5 export.

LFM2.5 ExecuTorch MLX

Pre-exported ExecuTorch artifacts for LiquidAI LFM2.5 models with the MLX backend for Apple Silicon.

This repo is an artifact companion for ExecuTorch MLX inference. It ships the .pte files so you can skip export and run directly with an MLX-enabled ExecuTorch Llama runner.

Overview

The pipeline has two stages:

  1. Export: convert the Hugging Face checkpoints into ExecuTorch .pte artifacts with MLX delegation.
  2. Inference: run the artifacts with the shared ExecuTorch Llama C++ runner or pybindings runner.

The artifacts were exported with bf16 model dtype and 4-bit weight-only linear quantization.

Files

File Size What
lfm2_5_350m_mlx_4w.pte 308 MiB LFM2.5 350M lowered to ExecuTorch MLX
lfm2_5_1_2b_mlx_4w.pte 849 MiB LFM2.5 1.2B Instruct lowered to ExecuTorch MLX

Tokenizers are not included. Download tokenizer files from the matching upstream LiquidAI Hugging Face checkpoints.

Performance

Validated on Apple Silicon with the ExecuTorch C++ llama_main runner, temperature=0, and prompt:

<|startoftext|><|im_start|>user
Who are you?<|im_end|>
<|im_start|>assistant

Median of 3 fresh runs:

Artifact Model load Prompt eval Decode Total TTFT
lfm2_5_350m_mlx_4w.pte 0.071 s 650.00 tok/s 330.43 tok/s 312.33 tok/s 0.020 s
lfm2_5_1_2b_mlx_4w.pte 0.073 s 481.48 tok/s 147.93 tok/s 136.24 tok/s 0.027 s

These are smoke benchmarks from short-context exports, not a full performance sweep. The 350M artifact was exported with max_seq_length=128, and the 1.2B artifact was exported with max_seq_length=64.

Prerequisites

  • macOS on Apple Silicon.
  • ExecuTorch built from source with EXECUTORCH_BUILD_MLX=ON.
  • Tokenizer files from the matching upstream LiquidAI checkpoints.
git clone https://github.com/pytorch/executorch ~/executorch
cd ~/executorch

./install_executorch.sh
pip install -e . --no-build-isolation
make lfm_2_5-mlx

The artifacts were validated against an ExecuTorch branch containing commit:

e4bd2e653e Enable LFM2.5 MLX export and runner build

Download

pip install huggingface_hub

hf download younghan-meta/LFM2.5-ExecuTorch-MLX \
    --local-dir lfm25_mlx

hf download LiquidAI/LFM2.5-350M \
    tokenizer.json tokenizer_config.json \
    --local-dir lfm25_350m_base

hf download LiquidAI/LFM2.5-1.2B-Instruct \
    tokenizer.json tokenizer_config.json \
    --local-dir lfm25_1_2b_base

Run

350M:

cmake-out/examples/models/llama/llama_main \
    --model_path lfm25_mlx/lfm2_5_350m_mlx_4w.pte \
    --tokenizer_path lfm25_350m_base/tokenizer.json \
    --prompt="<|startoftext|><|im_start|>user\nWho are you?<|im_end|>\n<|im_start|>assistant\n" \
    --temperature 0.3 \
    --max_new_tokens 64

1.2B Instruct:

cmake-out/examples/models/llama/llama_main \
    --model_path lfm25_mlx/lfm2_5_1_2b_mlx_4w.pte \
    --tokenizer_path lfm25_1_2b_base/tokenizer.json \
    --prompt="<|startoftext|><|im_start|>user\nWho are you?<|im_end|>\n<|im_start|>assistant\n" \
    --temperature 0.3 \
    --max_new_tokens 48

Re-export

From an ExecuTorch checkout with the LFM2.5 MLX config:

python -m extension.llm.export.export_llm \
    --config examples/models/lfm2/config/lfm2_mlx_4w.yaml \
    +base.model_class="lfm2_5_350m" \
    +base.params="examples/models/lfm2/config/lfm2_5_350m_config.json" \
    +export.output_name="lfm2_5_350m_mlx_4w.pte"

python -m extension.llm.export.export_llm \
    --config examples/models/lfm2/config/lfm2_mlx_4w.yaml \
    +base.model_class="lfm2_5_1_2b" \
    +base.params="examples/models/lfm2/config/lfm2_5_1_2b_config.json" \
    +export.output_name="lfm2_5_1_2b_mlx_4w.pte"

Checksums

7241608ed90239a5cc7464d010b4fa5a62694c07034a194139b3e7dd543ebaef  lfm2_5_350m_mlx_4w.pte
b243eb401c7f7a428a41fa698d174693655fb3209be160c64947a6b823fbc075  lfm2_5_1_2b_mlx_4w.pte
Downloads last month
26
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for younghan-meta/LFM2.5-ExecuTorch-MLX

Quantized
(49)
this model