Note: for the ExecuWhisper-specific fine-tuned formatter, see
younghan-meta/LFM2.5-350M-ExecuWhisper-Formatter. This repo continues to host the upstream-base LFM2.5 export.
LFM2.5 ExecuTorch MLX
Pre-exported ExecuTorch artifacts for LiquidAI LFM2.5 models with the MLX backend for Apple Silicon.
This repo is an artifact companion for ExecuTorch MLX inference. It ships the
.pte files so you can skip export and run directly with an MLX-enabled
ExecuTorch Llama runner.
Overview
The pipeline has two stages:
- Export: convert the Hugging Face checkpoints into ExecuTorch
.pteartifacts with MLX delegation. - Inference: run the artifacts with the shared ExecuTorch Llama C++ runner or pybindings runner.
The artifacts were exported with bf16 model dtype and 4-bit weight-only linear quantization.
Files
| File | Size | What |
|---|---|---|
lfm2_5_350m_mlx_4w.pte |
308 MiB | LFM2.5 350M lowered to ExecuTorch MLX |
lfm2_5_1_2b_mlx_4w.pte |
849 MiB | LFM2.5 1.2B Instruct lowered to ExecuTorch MLX |
Tokenizers are not included. Download tokenizer files from the matching upstream LiquidAI Hugging Face checkpoints.
Performance
Validated on Apple Silicon with the ExecuTorch C++ llama_main runner,
temperature=0, and prompt:
<|startoftext|><|im_start|>user
Who are you?<|im_end|>
<|im_start|>assistant
Median of 3 fresh runs:
| Artifact | Model load | Prompt eval | Decode | Total | TTFT |
|---|---|---|---|---|---|
lfm2_5_350m_mlx_4w.pte |
0.071 s | 650.00 tok/s | 330.43 tok/s | 312.33 tok/s | 0.020 s |
lfm2_5_1_2b_mlx_4w.pte |
0.073 s | 481.48 tok/s | 147.93 tok/s | 136.24 tok/s | 0.027 s |
These are smoke benchmarks from short-context exports, not a full performance
sweep. The 350M artifact was exported with max_seq_length=128, and the 1.2B
artifact was exported with max_seq_length=64.
Prerequisites
- macOS on Apple Silicon.
- ExecuTorch built from source with
EXECUTORCH_BUILD_MLX=ON. - Tokenizer files from the matching upstream LiquidAI checkpoints.
git clone https://github.com/pytorch/executorch ~/executorch
cd ~/executorch
./install_executorch.sh
pip install -e . --no-build-isolation
make lfm_2_5-mlx
The artifacts were validated against an ExecuTorch branch containing commit:
e4bd2e653e Enable LFM2.5 MLX export and runner build
Download
pip install huggingface_hub
hf download younghan-meta/LFM2.5-ExecuTorch-MLX \
--local-dir lfm25_mlx
hf download LiquidAI/LFM2.5-350M \
tokenizer.json tokenizer_config.json \
--local-dir lfm25_350m_base
hf download LiquidAI/LFM2.5-1.2B-Instruct \
tokenizer.json tokenizer_config.json \
--local-dir lfm25_1_2b_base
Run
350M:
cmake-out/examples/models/llama/llama_main \
--model_path lfm25_mlx/lfm2_5_350m_mlx_4w.pte \
--tokenizer_path lfm25_350m_base/tokenizer.json \
--prompt="<|startoftext|><|im_start|>user\nWho are you?<|im_end|>\n<|im_start|>assistant\n" \
--temperature 0.3 \
--max_new_tokens 64
1.2B Instruct:
cmake-out/examples/models/llama/llama_main \
--model_path lfm25_mlx/lfm2_5_1_2b_mlx_4w.pte \
--tokenizer_path lfm25_1_2b_base/tokenizer.json \
--prompt="<|startoftext|><|im_start|>user\nWho are you?<|im_end|>\n<|im_start|>assistant\n" \
--temperature 0.3 \
--max_new_tokens 48
Re-export
From an ExecuTorch checkout with the LFM2.5 MLX config:
python -m extension.llm.export.export_llm \
--config examples/models/lfm2/config/lfm2_mlx_4w.yaml \
+base.model_class="lfm2_5_350m" \
+base.params="examples/models/lfm2/config/lfm2_5_350m_config.json" \
+export.output_name="lfm2_5_350m_mlx_4w.pte"
python -m extension.llm.export.export_llm \
--config examples/models/lfm2/config/lfm2_mlx_4w.yaml \
+base.model_class="lfm2_5_1_2b" \
+base.params="examples/models/lfm2/config/lfm2_5_1_2b_config.json" \
+export.output_name="lfm2_5_1_2b_mlx_4w.pte"
Checksums
7241608ed90239a5cc7464d010b4fa5a62694c07034a194139b3e7dd543ebaef lfm2_5_350m_mlx_4w.pte
b243eb401c7f7a428a41fa698d174693655fb3209be160c64947a6b823fbc075 lfm2_5_1_2b_mlx_4w.pte
- Downloads last month
- 26
Model tree for younghan-meta/LFM2.5-ExecuTorch-MLX
Base model
LiquidAI/LFM2.5-1.2B-Base