mlabonne commited on
Commit
a4ba0a6
·
verified ·
1 Parent(s): 205939a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -74,6 +74,7 @@ LFM2.5-1.2B-Instruct is a general-purpose text-only model with the following fea
74
  | [**LFM2.5-1.2B-Instruct**](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct) | Original model checkpoint in native format. Best for fine-tuning or inference with Transformers and vLLM. |
75
  | [LFM2.5-1.2B-Instruct-GGUF](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF) | Quantized format for llama.cpp and compatible tools. Optimized for CPU inference and local deployment with reduced memory usage. |
76
  | [LFM2.5-1.2B-Instruct-ONNX](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-ONNX) | ONNX Runtime format for cross-platform deployment. Enables hardware-accelerated inference across diverse environments (cloud, edge, mobile). |
 
77
 
78
  We recommend using it for agentic tasks, data extraction, and RAG. It is not recommended for knowledge-intensive tasks and programming.
79
 
@@ -182,7 +183,7 @@ We compared LFM2.5-1.2B-Instruct with relevant sub-2B models on a diverse suite
182
  | Model | GPQA | MMLU-Pro | IFEval | IFBench | Multi-IF | AIME25 | BFCLv3 |
183
  |-------|------|----------|--------|---------|----------|--------|--------|
184
  | **LFM2.5-1.2B-Instruct** | 38.89 | 44.35 | 86.23 | 47.33 | 60.98 | 14.00 | 49.12 |
185
- | Qwen3-1.7B | 34.85 | 42.91 | 73.68 | 21.33 | 56.48 | 9.33 | 46.30 |
186
  | Granite 4.0-1B | 24.24 | 33.53 | 79.61 | 21.00 | 43.65 | 3.33 | 52.43 |
187
  | Llama 3.2 1B Instruct | 16.57 | 20.80 | 52.37 | 15.93 | 30.16 | 0.33 | 21.44 |
188
  | Gemma 3 1B IT | 24.24 | 14.04 | 63.25 | 20.47 | 44.31 | 1.00 | 16.64 |
@@ -199,9 +200,9 @@ In addition, we are partnering with AMD, Qualcomm, and Nexa AI to bring the LFM2
199
 
200
  | Device | Inference | Framework | Model | Prefill (tok/s) | Decode (tok/s) | Memory (GB) |
201
  | ---------------------------------------------------- | --------- | ---------------- | -------------------- | --------------- | -------------- | ----------- |
202
- | Qualcomm Snapdragon® X Elite | NPU | NexaML | LFM2.5-1.2B-instruct | 2591 | 63 | 0.9GB |
203
- | Qualcomm Snapdragon® Gen4 (ROG Phone9 Pro) | NPU | NexaML | LFM2.5-1.2B-instruct | 4391 | 82 | 0.9GB |
204
- | Qualcomm Snapdragon® Gen4 (Samsung Galaxy S25 Ultra) | CPU | llama.cpp (Q4_0) | LFM2.5-1.2B-instruct | 335 | 70 | 719MB |
205
  | Qualcomm Snapdragon® Gen4 (Samsung Galaxy S25 Ultra) | CPU | llama.cpp (Q4_0) | Qwen3-1.7B | 181 | 40 | 1306MB |
206
 
207
  These capabilities unlock new deployment scenarios across various devices, including vehicles, mobile devices, laptops, IoT devices, and embedded systems.
 
74
  | [**LFM2.5-1.2B-Instruct**](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct) | Original model checkpoint in native format. Best for fine-tuning or inference with Transformers and vLLM. |
75
  | [LFM2.5-1.2B-Instruct-GGUF](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF) | Quantized format for llama.cpp and compatible tools. Optimized for CPU inference and local deployment with reduced memory usage. |
76
  | [LFM2.5-1.2B-Instruct-ONNX](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-ONNX) | ONNX Runtime format for cross-platform deployment. Enables hardware-accelerated inference across diverse environments (cloud, edge, mobile). |
77
+ | [LFM2.5-1.2B-Instruct-MLX](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct-MLX-8bit) | MLX format for Apple Silicon. Optimized for fast inference on Mac devices using the MLX framework. |
78
 
79
  We recommend using it for agentic tasks, data extraction, and RAG. It is not recommended for knowledge-intensive tasks and programming.
80
 
 
183
  | Model | GPQA | MMLU-Pro | IFEval | IFBench | Multi-IF | AIME25 | BFCLv3 |
184
  |-------|------|----------|--------|---------|----------|--------|--------|
185
  | **LFM2.5-1.2B-Instruct** | 38.89 | 44.35 | 86.23 | 47.33 | 60.98 | 14.00 | 49.12 |
186
+ | Qwen3-1.7B (instruct)| 34.85 | 42.91 | 73.68 | 21.33 | 56.48 | 9.33 | 46.30 |
187
  | Granite 4.0-1B | 24.24 | 33.53 | 79.61 | 21.00 | 43.65 | 3.33 | 52.43 |
188
  | Llama 3.2 1B Instruct | 16.57 | 20.80 | 52.37 | 15.93 | 30.16 | 0.33 | 21.44 |
189
  | Gemma 3 1B IT | 24.24 | 14.04 | 63.25 | 20.47 | 44.31 | 1.00 | 16.64 |
 
200
 
201
  | Device | Inference | Framework | Model | Prefill (tok/s) | Decode (tok/s) | Memory (GB) |
202
  | ---------------------------------------------------- | --------- | ---------------- | -------------------- | --------------- | -------------- | ----------- |
203
+ | Qualcomm Snapdragon® X Elite | NPU | NexaML | LFM2.5-1.2B-Instruct | 2591 | 63 | 0.9GB |
204
+ | Qualcomm Snapdragon® Gen4 (ROG Phone9 Pro) | NPU | NexaML | LFM2.5-1.2B-Instruct | 4391 | 82 | 0.9GB |
205
+ | Qualcomm Snapdragon® Gen4 (Samsung Galaxy S25 Ultra) | CPU | llama.cpp (Q4_0) | LFM2.5-1.2B-Instruct | 335 | 70 | 719MB |
206
  | Qualcomm Snapdragon® Gen4 (Samsung Galaxy S25 Ultra) | CPU | llama.cpp (Q4_0) | Qwen3-1.7B | 181 | 40 | 1306MB |
207
 
208
  These capabilities unlock new deployment scenarios across various devices, including vehicles, mobile devices, laptops, IoT devices, and embedded systems.