Update README.md
Browse files
README.md
CHANGED
|
@@ -20,4 +20,22 @@ tags:
|
|
| 20 |
|
| 21 |
# **LFM2-2.6B-Exp-GGUF**
|
| 22 |
|
| 23 |
-
> LiquidAI/LFM2-2.6B-Exp is a 2.6 billion-parameter experimental language model from the LFM2 series, featuring a novel hybrid architecture that combines 10 double-gated short-range convolution blocks with 6 Grouped Query Attention (GQA) blocks for superior efficiency in edge AI and on-device deployment, achieving 3x faster training, 2x faster CPU decode/prefill than Qwen3, and top performance like 82.41% on GSM8K math reasoning and 79.56% on IFEval instruction following—outperforming larger models such as Llama 3.2-3B-Instruct and Gemma-3-4b-it. Optimized for multilingual support (English, Arabic, Chinese, French, German, Japanese, Korean, Spanish) with a 32K token context window, it uses Lfm2ForCausalLM architecture under LFM1.0 license, enabling conversational text generation on resource-constrained devices like smartphones, laptops, or vehicles via Transformers with low KV cache requirements. This post-trained checkpoint sets new standards in quality, speed, and memory efficiency for real-world AI applications across CPUs, GPUs, and NPUs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
# **LFM2-2.6B-Exp-GGUF**
|
| 22 |
|
| 23 |
+
> LiquidAI/LFM2-2.6B-Exp is a 2.6 billion-parameter experimental language model from the LFM2 series, featuring a novel hybrid architecture that combines 10 double-gated short-range convolution blocks with 6 Grouped Query Attention (GQA) blocks for superior efficiency in edge AI and on-device deployment, achieving 3x faster training, 2x faster CPU decode/prefill than Qwen3, and top performance like 82.41% on GSM8K math reasoning and 79.56% on IFEval instruction following—outperforming larger models such as Llama 3.2-3B-Instruct and Gemma-3-4b-it. Optimized for multilingual support (English, Arabic, Chinese, French, German, Japanese, Korean, Spanish) with a 32K token context window, it uses Lfm2ForCausalLM architecture under LFM1.0 license, enabling conversational text generation on resource-constrained devices like smartphones, laptops, or vehicles via Transformers with low KV cache requirements. This post-trained checkpoint sets new standards in quality, speed, and memory efficiency for real-world AI applications across CPUs, GPUs, and NPUs.
|
| 24 |
+
|
| 25 |
+
## LFM2-2.6B-Exp [GGUF]
|
| 26 |
+
|
| 27 |
+
| File Name | Quant Type | File Size | File Link |
|
| 28 |
+
| - | - | - | - |
|
| 29 |
+
| LFM2-2.6B-Exp.BF16.gguf | BF16 | 5.41 GB | [Download](https://huggingface.co/prithivMLmods/LFM2-2.6B-Exp-GGUF/blob/main/LFM2-2.6B-Exp.BF16.gguf) |
|
| 30 |
+
| LFM2-2.6B-Exp.F16.gguf | F16 | 5.41 GB | [Download](https://huggingface.co/prithivMLmods/LFM2-2.6B-Exp-GGUF/blob/main/LFM2-2.6B-Exp.F16.gguf) |
|
| 31 |
+
| LFM2-2.6B-Exp.F32.gguf | F32 | 10.8 GB | [Download](https://huggingface.co/prithivMLmods/LFM2-2.6B-Exp-GGUF/blob/main/LFM2-2.6B-Exp.F32.gguf) |
|
| 32 |
+
| LFM2-2.6B-Exp.Q8_0.gguf | Q8_0 | 2.88 GB | [Download](https://huggingface.co/prithivMLmods/LFM2-2.6B-Exp-GGUF/blob/main/LFM2-2.6B-Exp.Q8_0.gguf) |
|
| 33 |
+
|
| 34 |
+
## Quants Usage
|
| 35 |
+
|
| 36 |
+
(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
|
| 37 |
+
|
| 38 |
+
Here is a handy graph by ikawrakow comparing some lower-quality quant
|
| 39 |
+
types (lower is better):
|
| 40 |
+
|
| 41 |
+

|