Liquid AI
Try LFM • Documentation • LEAP

LFM2-24B-A2B-GGUF

LFM2 is a family of hybrid models designed for on-device deployment. LFM2-24B-A2B is the largest model in the family, scaling the architecture to 24 billion parameters while keeping inference efficient.

  • Best-in-class efficiency: A 24B MoE model with only 2B active parameters per token, fitting in 32 GB of RAM for deployment on consumer laptops and desktops.
  • Fast edge inference: 112 tok/s decode on AMD CPU, 293 tok/s on H100. Fits in 32B GB of RAM with day-one support llama.cpp, vLLM, and SGLang.
  • Predictable scaling: Quality improves log-linearly from 350M to 24B total parameters, confirming the LFM2 hybrid architecture scales reliably across nearly two orders of magnitude.

image

Find more information about LFM2-24B-A2B in our blog post.

How to run LFM2

Example usage with llama.cpp:

llama-cli -hf LiquidAI/LFM2-24B-A2B-GGUF
Downloads last month
109
GGUF
Model size
24B params
Architecture
lfm2moe
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LiquidAI/LFM2-24B-A2B-GGUF

Quantized
(12)
this model

Collection including LiquidAI/LFM2-24B-A2B-GGUF