How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf prithivMLmods/LFM2-2.6B-Exp-GGUF:
# Run inference directly in the terminal:
llama-cli -hf prithivMLmods/LFM2-2.6B-Exp-GGUF:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf prithivMLmods/LFM2-2.6B-Exp-GGUF:
# Run inference directly in the terminal:
llama-cli -hf prithivMLmods/LFM2-2.6B-Exp-GGUF:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf prithivMLmods/LFM2-2.6B-Exp-GGUF:
# Run inference directly in the terminal:
./llama-cli -hf prithivMLmods/LFM2-2.6B-Exp-GGUF:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf prithivMLmods/LFM2-2.6B-Exp-GGUF:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf prithivMLmods/LFM2-2.6B-Exp-GGUF:
Use Docker
docker model run hf.co/prithivMLmods/LFM2-2.6B-Exp-GGUF:
Quick Links

LFM2-2.6B-Exp-GGUF

LiquidAI/LFM2-2.6B-Exp is a 2.6 billion-parameter experimental language model from the LFM2 series, featuring a novel hybrid architecture that combines 10 double-gated short-range convolution blocks with 6 Grouped Query Attention (GQA) blocks for superior efficiency in edge AI and on-device deployment, achieving 3x faster training, 2x faster CPU decode/prefill than Qwen3, and top performance like 82.41% on GSM8K math reasoning and 79.56% on IFEval instruction following—outperforming larger models such as Llama 3.2-3B-Instruct and Gemma-3-4b-it. Optimized for multilingual support (English, Arabic, Chinese, French, German, Japanese, Korean, Spanish) with a 32K token context window, it uses Lfm2ForCausalLM architecture under LFM1.0 license, enabling conversational text generation on resource-constrained devices like smartphones, laptops, or vehicles via Transformers with low KV cache requirements. This post-trained checkpoint sets new standards in quality, speed, and memory efficiency for real-world AI applications across CPUs, GPUs, and NPUs.

LFM2-2.6B-Exp [GGUF]

File Name Quant Type File Size File Link
LFM2-2.6B-Exp.BF16.gguf BF16 5.41 GB Download
LFM2-2.6B-Exp.F16.gguf F16 5.41 GB Download
LFM2-2.6B-Exp.F32.gguf F32 10.8 GB Download
LFM2-2.6B-Exp.Q8_0.gguf Q8_0 2.88 GB Download

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png

Downloads last month
147
GGUF
Model size
3B params
Architecture
lfm2
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/LFM2-2.6B-Exp-GGUF

Unable to build the model tree, the base model loops to the model itself. Learn more.