--- library_name: transformers license: other license_name: lfm1.0 license_link: LICENSE language: - en - ar - zh - fr - de - ja - ko - es - pt pipeline_tag: text-generation tags: - liquid - lfm2.5 - edge ---

Try LFM • Docs • LEAP • Discord

# LFM2.5-8B-A1B-Base LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning. - **On-device personal assistant**: Designed to power real-life applications, chaining tool calls, and following complex instructions on all devices. - **Compressed performance**: Competitive with much larger dense and MoE models on instruction following and agentic tasks. - **Unmatched throughput**: Fastest in its size class on both CPU and GPU inference, with day-one support for llama.cpp, MLX, vLLM, and SGLang. Find more information about LFM2.5-8B-A1B in our [blog post](https://www.liquid.ai/blog/lfm2-5-8b-a1b). ![image](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/F_rR3bNCHLQIx7TKVqe8U.png) **AA-Omniscience Index (higher is better) rewards correct answers and penalizes hallucinations. Scores range from -100 to 100. See more results on [Artificial Analysis](https://artificialanalysis.ai/evaluations/omniscience).* ## 🗒️ Model Details | Model | Parameters | Description | | --- | --- | --- | | [**LFM2.5-8B-A1B-Base**](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-Base) | 8.3B total / 1.5B active | Pre-trained base model for fine-tuning | | [LFM2.5-8B-A1B](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B) | 8.3B total / 1.5B active | Reasoning-tuned general-purpose model | LFM2.5-8B-A1B is a general-purpose text-only model with the following features: - **Total parameters**: 8.3B - **Active parameters**: 1.5B - **Number of layers**: 24 (18 double-gated LIV conv + 6 GQA) - **Training budget**: 38 trillion tokens - **Context length**: 131,072 - **Vocabulary size**: 128,000 - **Languages**: English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, Spanish - **Generation parameters**: We recommend the following parameters: - `temperature: 0.2` - `top_p: 80` - `repetition_penalty: 1.05` | Model | Description | | --- | --- | | [**LFM2.5-8B-A1B**](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B) | Original model checkpoint in native format. Best for fine-tuning or inference with Transformers, vLLM, and SGLang. | | [LFM2.5-8B-A1B-GGUF](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF) | Quantized format for llama.cpp and compatible tools. Optimized for edge inference and local deployment. | | [LFM2.5-8B-A1B-ONNX](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-ONNX) | ONNX Runtime format for cross-platform deployment. | | [LFM2.5-8B-A1B-MLX](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-MLX) | MLX format for Apple Silicon. Optimized for fast inference on Mac devices. | We recommend using LFM2.5-8B-A1B for agentic workflows, tool use, structured outputs, multilingual assistants, and on-device personal-assistant applications. It is not the best fit for heavy programming or knowledge-intensive question answering without retrieval. ## 🏃 Inference LFM2.5-8B-A1B is supported by many inference frameworks. See the [Inference documentation](https://docs.liquid.ai/lfm/inference/transformers) for the full list. | Name | Description | Docs | Notebook | |------|-------------|------|:--------:| | [Transformers](https://github.com/huggingface/transformers) | Simple inference with direct access to model internals. | Link |

| | [MLX](https://github.com/ml-explore/mlx) | Apple's machine learning framework optimized for Apple Silicon. | Link | — | | [LM Studio](https://lmstudio.ai/) | Desktop application for running LLMs locally. | Link | — | Quick start with Transformers (compatible with `transformers>=5.0.0`): ```python from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer model_id = "LiquidAI/LFM2.5-8B-A1B-Base" model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", dtype="bfloat16", # attn_implementation="flash_attention_2" <- uncomment on compatible GPU ) tokenizer = AutoTokenizer.from_pretrained(model_id) streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True) prompt = "What is C. elegans?" input_ids = tokenizer.apply_chat_template( [{"role": "user", "content": prompt}], add_generation_prompt=True, return_tensors="pt", tokenize=True, ).to(model.device) output = model.generate( input_ids, do_sample=True, temperature=0.2, top_k=80, repetition_penalty=1.05, max_new_tokens=8192, streamer=streamer, ) ``` ## 🔧 Fine-Tuning We recommend fine-tuning LFM2.5 for your specific use case to achieve the best results. | Name | Description | Docs | Notebook | |------|-------------|------|----------| | CPT ([Unsloth](https://github.com/unslothai/unsloth)) | Continued Pre-Training using Unsloth for text completion. | Link |

| ## 📬 Contact - Got questions or want to connect? [Join our Discord community](https://discord.com/invite/liquid-ai). - If you are interested in custom solutions with edge deployment, please contact [our sales team](https://www.liquid.ai/contact). ## Citation ```bibtex @article{liquidAI20268BA1B, author = {Liquid AI}, title = {LFM2.5-8B-A1B: Personal Assistant On Your Laptop}, journal = {Liquid AI Blog}, year = {2026}, note = {www.liquid.ai/blog/lfm2-5-8b-a1b}, } ``` ```bibtex @article{liquidai2025lfm2, title = {LFM2 Technical Report}, author = {Liquid AI}, journal = {arXiv preprint arXiv:2511.23404}, year = {2025} } ```