Instructions to use LiquidAI/LFM2-8B-A1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LiquidAI/LFM2-8B-A1B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LiquidAI/LFM2-8B-A1B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2-8B-A1B")
model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2-8B-A1B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use LiquidAI/LFM2-8B-A1B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LiquidAI/LFM2-8B-A1B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2-8B-A1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/LiquidAI/LFM2-8B-A1B

SGLang

How to use LiquidAI/LFM2-8B-A1B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LiquidAI/LFM2-8B-A1B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2-8B-A1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LiquidAI/LFM2-8B-A1B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2-8B-A1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use LiquidAI/LFM2-8B-A1B with Docker Model Runner:
```
docker model run hf.co/LiquidAI/LFM2-8B-A1B
```

Enjoying this one in multi-user chat. + laptop perf

by BingoBird - opened Jan 6

Discussion

BingoBird

Jan 6

We are enjoying this model in multi-user chat, particularly the relative absence of mind-blottingly dominant 'assistant' or 'personal moral counselor' behavior ruts.

BingoBird changed discussion title from Enjoying this one in multi-user chat. to Enjoying this one in multi-user chat. + laptop perf Jan 6

BingoBird

Jan 6

On a Thinkpad T495 with 16GB, integrated Vega8 GPU.
Side processes: no browser, no media playback, light perf mon and terminal-based chat clients:

$ llama-bench -m LFM2-8B-A1B-US-Q5_K_XL.gguf -t 5  -p 512 -n 128 -ngl 0,30,99 2> /dev/null > myresults.txt
| model                          |       size |     params | backend    | ngl | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| lfm2moe 8B.A1B Q5_K - Medium   |   5.51 GiB |     8.34 B | Vulkan     |   0 |       5 |           pp512 |         78.87 ± 0.50 |
| lfm2moe 8B.A1B Q5_K - Medium   |   5.51 GiB |     8.34 B | Vulkan     |   0 |       5 |           tg128 |         14.11 ± 0.12 |
| lfm2moe 8B.A1B Q5_K - Medium   |   5.51 GiB |     8.34 B | Vulkan     |  30 |       5 |           pp512 |        109.15 ± 0.85 |
| lfm2moe 8B.A1B Q5_K - Medium   |   5.51 GiB |     8.34 B | Vulkan     |  30 |       5 |           tg128 |         17.09 ± 0.08 |
| lfm2moe 8B.A1B Q5_K - Medium   |   5.51 GiB |     8.34 B | Vulkan     |  99 |       5 |           pp512 |        108.31 ± 0.73 |
| lfm2moe 8B.A1B Q5_K - Medium   |   5.51 GiB |     8.34 B | Vulkan     |  99 |       5 |           tg128 |         17.08 ± 0.04 |

$ llama-bench -m LFM2-8B-A1B-Q4_K_S.gguf -t 5  -p 512 -n 128 -ngl 0,30,99 2> /dev/null >> myresults.txt

| model                          |       size |     params | backend    | ngl | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| lfm2moe 8B.A1B Q4_K - Small    |   4.42 GiB |     8.34 B | Vulkan     |   0 |       5 |           pp512 |         84.27 ± 0.59 |
| lfm2moe 8B.A1B Q4_K - Small    |   4.42 GiB |     8.34 B | Vulkan     |   0 |       5 |           tg128 |         17.75 ± 0.09 |
| lfm2moe 8B.A1B Q4_K - Small    |   4.42 GiB |     8.34 B | Vulkan     |  30 |       5 |           pp512 |        111.52 ± 0.91 |
| lfm2moe 8B.A1B Q4_K - Small    |   4.42 GiB |     8.34 B | Vulkan     |  30 |       5 |           tg128 |         22.48 ± 0.08 |
| lfm2moe 8B.A1B Q4_K - Small    |   4.42 GiB |     8.34 B | Vulkan     |  99 |       5 |           pp512 |        111.47 ± 0.95 |
| lfm2moe 8B.A1B Q4_K - Small    |   4.42 GiB |     8.34 B | Vulkan     |  99 |       5 |           tg128 |         22.46 ± 0.08 |

$ llama-bench -m llama-2-7b.Q4_K_M.gguf -t 5  -p 512 -n 128 -ngl 0,30,99 2> /dev/null >> myresults.txt

| model                          |       size |     params | backend    | ngl | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| llama 7B Q4_K - Medium         |   3.80 GiB |     6.74 B | Vulkan     |   0 |       5 |           pp512 |         29.30 ± 0.09 |
| llama 7B Q4_K - Medium         |   3.80 GiB |     6.74 B | Vulkan     |   0 |       5 |           tg128 |          4.96 ± 0.06 |
| llama 7B Q4_K - Medium         |   3.80 GiB |     6.74 B | Vulkan     |  30 |       5 |           pp512 |         30.17 ± 0.24 |
| llama 7B Q4_K - Medium         |   3.80 GiB |     6.74 B | Vulkan     |  30 |       5 |           tg128 |          4.80 ± 0.01 |
| llama 7B Q4_K - Medium         |   3.80 GiB |     6.74 B | Vulkan     |  99 |       5 |           pp512 |         30.57 ± 0.03 |
| llama 7B Q4_K - Medium         |   3.80 GiB |     6.74 B | Vulkan     |  99 |       5 |           tg128 |          5.08 ± 0.01 |

$ llama-bench -m Qwen3-4B-Instruct-2507-Q4_K_M.gguf -t 5  -p 512 -n 128 -ngl 0,30,99 2> /dev/null >> myresults.txt

| model                          |       size |     params | backend    | ngl | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| qwen3 4B Q4_K - Medium         |   2.32 GiB |     4.02 B | Vulkan     |   0 |       5 |           pp512 |         50.29 ± 0.28 |
| qwen3 4B Q4_K - Medium         |   2.32 GiB |     4.02 B | Vulkan     |   0 |       5 |           tg128 |          5.65 ± 0.15 |
| qwen3 4B Q4_K - Medium         |   2.32 GiB |     4.02 B | Vulkan     |  30 |       5 |           pp512 |         50.94 ± 0.09 |
| qwen3 4B Q4_K - Medium         |   2.32 GiB |     4.02 B | Vulkan     |  30 |       5 |           tg128 |          6.78 ± 0.03 |
| qwen3 4B Q4_K - Medium         |   2.32 GiB |     4.02 B | Vulkan     |  99 |       5 |           pp512 |         53.77 ± 0.21 |
| qwen3 4B Q4_K - Medium         |   2.32 GiB |     4.02 B | Vulkan     |  99 |       5 |           tg128 |          7.55 ± 0.01 |

$ llama-bench -m granite-4.0-h-tiny-Q4_K_M.gguf -t 5  -p 512 -n 128 -ngl 0,30,99 2> /dev/null >> myresults.txt

| model                          |       size |     params | backend    | ngl | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| granitehybrid 1B Q4_K - Medium |   3.96 GiB |     6.94 B | Vulkan     |   0 |       5 |           pp512 |         61.95 ± 1.04 |
| granitehybrid 1B Q4_K - Medium |   3.96 GiB |     6.94 B | Vulkan     |   0 |       5 |           tg128 |          8.63 ± 0.04 |
| granitehybrid 1B Q4_K - Medium |   3.96 GiB |     6.94 B | Vulkan     |  30 |       5 |           pp512 |         47.91 ± 0.17 |
| granitehybrid 1B Q4_K - Medium |   3.96 GiB |     6.94 B | Vulkan     |  30 |       5 |           tg128 |         11.26 ± 0.13 |
| granitehybrid 1B Q4_K - Medium |   3.96 GiB |     6.94 B | Vulkan     |  99 |       5 |           pp512 |         90.32 ± 2.17 |
| granitehybrid 1B Q4_K - Medium |   3.96 GiB |     6.94 B | Vulkan     |  99 |       5 |           tg128 |         13.36 ± 0.05 |

build: e1f15b454 (7502)

This MoE model is in another league compared to other models runnable on this laptop.

It's a truly great gift to everyone. Thank you.

mlabonne

Liquid AI org Jan 6

Awesome, thanks a lot for your message! We're working on new models with the LFM2.5 generation. I hope you'll like it. :)

sometimesanotion

Jan 6

I would love to see a scale-up from this model with more mid/late attention layers. I think there's a lot of room for powerful edge sMoEs between 12-24B parameters.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment