---
library_name: transformers
license: other
license_name: lfm1.0
license_link: LICENSE
language:
- en
- ar
- zh
- fr
- de
- ja
- ko
- es
pipeline_tag: text-generation
tags:
- liquid
- unsloth
- lfm2.5
- edge
base_model:
- LiquidAI/LFM2.5-1.2B-Instruct
---
> [!NOTE]
> Includes Unsloth **chat template fixes**!
For `llama.cpp`, use `--jinja`
>
Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.
|
| [vLLM](https://github.com/vllm-project/vllm) | High-throughput production deployments with GPU. | Link |
|
| [llama.cpp](https://github.com/ggml-org/llama.cpp) | Cross-platform inference with CPU offloading. | Link |
|
| [MLX](https://github.com/ml-explore/mlx) | Apple's machine learning framework optimized for Apple Silicon. | Link | — |
| [LM Studio](https://lmstudio.ai/) | Desktop application for running LLMs locally. | Link | — |
Here's a quick start example with Transformers:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
model_id = "LiquidAI/LFM2.5-1.2B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
dtype="bfloat16",
# attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
prompt = "What is C. elegans?"
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True,
return_tensors="pt",
tokenize=True,
).to(model.device)
output = model.generate(
input_ids,
do_sample=True,
temperature=0.1,
top_k=50,
top_p=0.1,
repetition_penalty=1.05,
max_new_tokens=512,
streamer=streamer,
)
```
## 🔧 Fine-Tuning
We recommend fine-tuning LFM2.5 for your specific use case to achieve the best results.
| Name | Description | Docs | Notebook |
|------|-------------|------|----------|
| SFT ([Unsloth](https://github.com/unslothai/unsloth)) | Supervised Fine-Tuning with LoRA using Unsloth. | Link |
|
| SFT ([TRL](https://github.com/huggingface/trl)) | Supervised Fine-Tuning with LoRA using TRL. | Link |
|
| DPO ([TRL](https://github.com/huggingface/trl)) | Direct Preference Optimization with LoRA using TRL. | Link |
|
## 📊 Performance
### Benchmarks
We compared LFM2.5-1.2B-Instruct with relevant sub-2B models on a diverse suite of benchmarks.
| Model | GPQA | MMLU-Pro | IFEval | IFBench | Multi-IF | AIME25 | BFCLv3 |
|-------|------|----------|--------|---------|----------|--------|--------|
| **LFM2.5-1.2B-Instruct** | 38.89 | 44.35 | 86.23 | 47.33 | 60.98 | 14.00 | 49.12 |
| Qwen3-1.7B | 34.85 | 42.91 | 73.68 | 21.33 | 56.48 | 9.33 | 46.30 |
| Granite 4.0-1B | 24.24 | 33.53 | 79.61 | 21.00 | 43.65 | 3.33 | 52.43 |
| Llama 3.2 1B Instruct | 16.57 | 20.80 | 52.37 | 15.93 | 30.16 | 0.33 | 21.44 |
| Gemma 3 1B IT | 24.24 | 14.04 | 63.25 | 20.47 | 44.31 | 1.00 | 16.64 |
GPQA, MMLU-Pro, IFBench, and AIME25 follow [ArtificialAnalysis's methodology](https://artificialanalysis.ai/methodology/intelligence-benchmarking). For IFEval and Multi-IF, we report the average score across strict and loose prompt and instruction accuracies. For BFCLv3, we report the final weighted average score with a custom Liquid handler to support our tool use template.
### Inference speed
LFM2.5-1.2B-Instruct offers extremely fast inference speed on CPUs with a low memory profile compared to similar-sized models.

In addition, we are partnering with AMD, Qualcomm, and Nexa AI to bring the LFM2.5 family to NPUs. These optimized models are available through our partners, enabling highly efficient on-device inference.
| Device | Inference | Framework | Model | Prefill (tok/s) | Decode (tok/s) | Memory (GB) |
| ---------------------------------------------------- | --------- | ---------------- | -------------------- | --------------- | -------------- | ----------- |
| Qualcomm Snapdragon® X Elite | NPU | NexaML | LFM2.5-1.2B-instruct | 2591 | 63 | 0.9GB |
| Qualcomm Snapdragon® Gen4 (ROG Phone9 Pro) | NPU | NexaML | LFM2.5-1.2B-instruct | 4391 | 82 | 0.9GB |
| Qualcomm Snapdragon® Gen4 (Samsung Galaxy S25 Ultra) | CPU | llama.cpp (Q4_0) | LFM2.5-1.2B-instruct | 335 | 70 | 719MB |
| Qualcomm Snapdragon® Gen4 (Samsung Galaxy S25 Ultra) | CPU | llama.cpp (Q4_0) | Qwen3-1.7B | 181 | 40 | 1306MB |
These capabilities unlock new deployment scenarios across various devices, including vehicles, mobile devices, laptops, IoT devices, and embedded systems.
## Contact
For enterprise solutions and edge deployment, contact [sales@liquid.ai](mailto:sales@liquid.ai).
## Citation
```bibtex
@article{liquidai2025lfm2,
title={LFM2 Technical Report},
author={Liquid AI},
journal={arXiv preprint arXiv:2511.23404},
year={2025}
}
```