LiquidAI
/

LFM2.5-1.2B-Instruct

@@ -74,6 +74,7 @@ LFM2.5-1.2B-Instruct is a general-purpose text-only model with the following fea
 | [**LFM2.5-1.2B-Instruct**](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct) | Original model checkpoint in native format. Best for fine-tuning or inference with Transformers and vLLM. |
 | [LFM2.5-1.2B-Instruct-GGUF](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF) | Quantized format for llama.cpp and compatible tools. Optimized for CPU inference and local deployment with reduced memory usage. |
 | [LFM2.5-1.2B-Instruct-ONNX](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-ONNX) | ONNX Runtime format for cross-platform deployment. Enables hardware-accelerated inference across diverse environments (cloud, edge, mobile). |
 We recommend using it for agentic tasks, data extraction, and RAG. It is not recommended for knowledge-intensive tasks and programming.
@@ -182,7 +183,7 @@ We compared LFM2.5-1.2B-Instruct with relevant sub-2B models on a diverse suite
 | Model | GPQA | MMLU-Pro | IFEval | IFBench | Multi-IF | AIME25 | BFCLv3 |
 |-------|------|----------|--------|---------|----------|--------|--------|
 | **LFM2.5-1.2B-Instruct** | 38.89 | 44.35 | 86.23 | 47.33 | 60.98 | 14.00 | 49.12 |
-| Qwen3-1.7B | 34.85 | 42.91 | 73.68 | 21.33 | 56.48 | 9.33 | 46.30 |
 | Granite 4.0-1B | 24.24 | 33.53 | 79.61 | 21.00 | 43.65 | 3.33 | 52.43 |
 | Llama 3.2 1B Instruct | 16.57 | 20.80 | 52.37 | 15.93 | 30.16 | 0.33 | 21.44 |
 | Gemma 3 1B IT | 24.24 | 14.04 | 63.25 | 20.47 | 44.31 | 1.00 | 16.64 |
@@ -199,9 +200,9 @@ In addition, we are partnering with AMD, Qualcomm, and Nexa AI to bring the LFM2
 | Device                                               | Inference | Framework        | Model                | Prefill (tok/s) | Decode (tok/s) | Memory (GB) |
 | ---------------------------------------------------- | --------- | ---------------- | -------------------- | --------------- | -------------- | ----------- |
-| Qualcomm Snapdragon® X Elite                         | NPU       | NexaML           | LFM2.5-1.2B-instruct | 2591            | 63             | 0.9GB       |
-| Qualcomm Snapdragon® Gen4 (ROG Phone9 Pro)           | NPU       | NexaML           | LFM2.5-1.2B-instruct | 4391            | 82             | 0.9GB       |
-| Qualcomm Snapdragon® Gen4 (Samsung Galaxy S25 Ultra) | CPU       | llama.cpp (Q4_0) | LFM2.5-1.2B-instruct | 335             | 70             | 719MB       |
 | Qualcomm Snapdragon® Gen4 (Samsung Galaxy S25 Ultra) | CPU       | llama.cpp (Q4_0) | Qwen3-1.7B           | 181             | 40             | 1306MB      |
 These capabilities unlock new deployment scenarios across various devices, including vehicles, mobile devices, laptops, IoT devices, and embedded systems.

 | [**LFM2.5-1.2B-Instruct**](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct) | Original model checkpoint in native format. Best for fine-tuning or inference with Transformers and vLLM. |
 | [LFM2.5-1.2B-Instruct-GGUF](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF) | Quantized format for llama.cpp and compatible tools. Optimized for CPU inference and local deployment with reduced memory usage. |
 | [LFM2.5-1.2B-Instruct-ONNX](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-ONNX) | ONNX Runtime format for cross-platform deployment. Enables hardware-accelerated inference across diverse environments (cloud, edge, mobile). |
+| [LFM2.5-1.2B-Instruct-MLX](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-MLX-8bit) | MLX format for Apple Silicon. Optimized for fast inference on Mac devices using the MLX framework. |
 We recommend using it for agentic tasks, data extraction, and RAG. It is not recommended for knowledge-intensive tasks and programming.
 | Model | GPQA | MMLU-Pro | IFEval | IFBench | Multi-IF | AIME25 | BFCLv3 |
 |-------|------|----------|--------|---------|----------|--------|--------|
 | **LFM2.5-1.2B-Instruct** | 38.89 | 44.35 | 86.23 | 47.33 | 60.98 | 14.00 | 49.12 |
+| Qwen3-1.7B (instruct)| 34.85 | 42.91 | 73.68 | 21.33 | 56.48 | 9.33 | 46.30 |
 | Granite 4.0-1B | 24.24 | 33.53 | 79.61 | 21.00 | 43.65 | 3.33 | 52.43 |
 | Llama 3.2 1B Instruct | 16.57 | 20.80 | 52.37 | 15.93 | 30.16 | 0.33 | 21.44 |
 | Gemma 3 1B IT | 24.24 | 14.04 | 63.25 | 20.47 | 44.31 | 1.00 | 16.64 |
 | Device                                               | Inference | Framework        | Model                | Prefill (tok/s) | Decode (tok/s) | Memory (GB) |
 | ---------------------------------------------------- | --------- | ---------------- | -------------------- | --------------- | -------------- | ----------- |
+| Qualcomm Snapdragon® X Elite                         | NPU       | NexaML           | LFM2.5-1.2B-Instruct | 2591            | 63             | 0.9GB       |
+| Qualcomm Snapdragon® Gen4 (ROG Phone9 Pro)           | NPU       | NexaML           | LFM2.5-1.2B-Instruct | 4391            | 82             | 0.9GB       |
+| Qualcomm Snapdragon® Gen4 (Samsung Galaxy S25 Ultra) | CPU       | llama.cpp (Q4_0) | LFM2.5-1.2B-Instruct | 335             | 70             | 719MB       |
 | Qualcomm Snapdragon® Gen4 (Samsung Galaxy S25 Ultra) | CPU       | llama.cpp (Q4_0) | Qwen3-1.7B           | 181             | 40             | 1306MB      |
 These capabilities unlock new deployment scenarios across various devices, including vehicles, mobile devices, laptops, IoT devices, and embedded systems.