Instructions to use LiquidAI/LFM2.5-8B-A1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LiquidAI/LFM2.5-8B-A1B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LiquidAI/LFM2.5-8B-A1B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2.5-8B-A1B")
model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-8B-A1B", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use LiquidAI/LFM2.5-8B-A1B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LiquidAI/LFM2.5-8B-A1B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2.5-8B-A1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/LiquidAI/LFM2.5-8B-A1B

SGLang

How to use LiquidAI/LFM2.5-8B-A1B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LiquidAI/LFM2.5-8B-A1B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2.5-8B-A1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LiquidAI/LFM2.5-8B-A1B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2.5-8B-A1B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use LiquidAI/LFM2.5-8B-A1B with Docker Model Runner:
```
docker model run hf.co/LiquidAI/LFM2.5-8B-A1B
```

This Model Has Virtually No Broad Knowledge

by phil111 - opened May 28

Discussion

phil111

May 28

38 trillion tokens is a gigantic corpus, so why does this model have virtually no broad knowledge for its size?

Did nearly all those tokens come from a handful of domains? Were most domains trained too briefly, and/or were their weights subsequently scrambled by training obsessively on a handful of select domains? Why are companies only making math and coding agents rather than general purpose AI models?

The general performance per parameter count of LLMs is regressing substantially outside a handful of grossly overfit domains, primarily coding and math. Even old ~7b models like Mistral have vastly more broad knowledge. For example, this model seems to know a lot from a handful of domains, but can't even get the main characters right from humanity's most watched shows and movies. And lesser known, but still popular shows and movies, are either 100% hallucinations (e.g. Canada's Corner Gas, which is its most watched Canadian show ever), or the model isn't even aware that they exist (e.g. UK's Black Books).

A balance of humanity's most popular knowledge is absolutely essential for most tasks, not just general Q&A. For example, the stories written about very popular things descend into nonsense because the included information is completely wrong. Nor do the models pick up on common references, humor, metaphor etc. because they're more filled with holes than Swiss cheese.

Would someone please start making at least somewhat balanced open source AI models rather than just coding and math tools?

bobbytaylor

May 29

Link an MCP web tool, you can easily get the knowledge depth that way. Personally for a small model I want it to know how to code, reason, and search, not give me stored facts and figures.

phil111

May 29

@bobbytaylor It's a common misconception that a lack of broad popular knowledge can be rectified with RAG (web tool). Across nearly every task other then simple Q&A, such as story writing, this greatly reduces performance.

For example, filling up the context with random on topic facts before writing a story greatly spikes RAM consumption and slows generation. But more importantly it forces the random unstructured facts into the stories, greatly reducing the quality and organic flow of the stories.

Additionally, it makes very little sense to depend on the internet and server uptime for the core functionality of local AI models. If you're going to depend on the web most people would be far better off using a fast, far more powerful, and very low cost online model like DeepSeek than suffer the huge latency hit of RAG, along with the numerous other disadvantages, such as reduced availability and reliability (e.g. when traveling/camping).

The truth is LLMs hit a ceiling a while back so the industry started trading broad knowledge and abilities for targeted gains in select domains, primarily in coding and math. The "belief" that what was discarded can be fixed with RAG is baseless post hoc rationalization.

ayylmaonade

May 29

I don't know what you're expecting. It's a 1B active param MoE with a specified focus on agentic tool use. If you're looking for general knowledge, go run Gemma 4 E2B/E4B or the 26B MoE.

Liquid have always made smaller, more specialised models specifically optimised for throughput and intelligence, (hence the large token corpus) not general knowledge. And by the sounds of it, if you're complaining about using an MCP tool or RAG "filling up context before writing a story" (and using RAM, which uh, obviously?) then you're clearly someone using this for roleplay or something. This isn't a model for that. Read model cards before you use them, dude.

xscMpOV

May 29

this is like downloading an RP model and then complaining that it cannot write proofs and solve integrals. yeah no shit?

Ashacorporation

May 29

Nah bro, this model is a tool agent, not Wikipedia. 8B MoE with 1B active + thinking = reasoning machine, not trivia bot.

But a solid CPT with 100M tokens of pop culture data? Boom, it knows Corner Gas characters. Just a targeted LoRA and it's golden.

People expect 1B active params to compare of dense 7B. That's not MoE. It's efficient, not omniscient.

Devs made right call focusing on tool use & math first. Want general knowledge? CPT it yourself. Base architecture is solid af. 💪.

But overall your point about the industry trading general knowledge for math/coding is spot on. That's a legit concern.

Hopefully an Instruct version without thinking mode comes too. Perfect for quick chats. 😁😁😁

phil111

May 29

@ayylmaonade I read the model card and never once role played w/ AI, but I understand why you made these assumption since they clearly stated their focus was agentic vs knowledge.

However, all newer consumer hardware OS models are trading tons of broad knowledge and abilities for moderate gains in just handful of select domains, primarily math, coding, and function calling, transforming general purpose AI models into tools that are useless to 99% of the population.

This is understandable at 300m parameters, but this model is 30x larger and trained on 38T tokens so it shouldn't have vastly less broad knowledge than Mistral 7b, Llama 3 8b, Gemma 2 9b etc. And for what? No sane person would ever use such a small and weak model to automate any task. The result will always be too riddled with errors to achieve a net positive effect. Same goes for high precision tasks like coding and math. The test scores may look impressive but the frequency of boneheaded errors is through the roof so you still need to double check even simple math and fix constant coding errors, so anyone who's even moderately competent would never use such a model for math or coding. And anyone who isn't competent will be even worse off because they will be unable to filter out the flood of boneheaded errors.

Anyways, my point is that the most efficient use of parameters is broad balanced training, especially since current AI models are sloppy (statistical next token predictors), so trading general performance and knowledge to train obsessively on a handful of high precision tasks like function calling, coding, and math makes no sense, since in the end the model still lacks the precision that coding, math, and function calling require to achieve a net positive effect.

nixudos

May 29

No luck running agentic tools on local, but does not seem to work in playground either.

ClaudioItaly

May 29

It can't run any of my MCP tools. I've tried repeatedly.. Nothing to do.

kulminaator

May 29

It can't run any of my MCP tools. I've tried repeatedly.. Nothing to do.

I figured out how to get the tools to work over here:

Downloaded new chat template from the model here into newer-template.jinja (the old tempalte in unsloth gguf files is broken for think and tool calls)
started latest llama.cpp vulkan with this cmd line

./llama-server --host 0.0.0.0  --port 8080  --metrics -hf unsloth/LFM2.5-8B-A1B-GGUF:Q4_K_M -c 92000  -t 6 --alias lfm25 --jinja --chat-template-file newer-template.jinja

then installed my pi extension for test

pi install npm:pi-http-util

then ran pi coding agent , asked it to fetch the latest news from liquid as this:

 ✓ New session started                                                                                                                                                                        
                                                                                                                                                                                              

                                                                                                                                                                                              
> Can you check the latest news from https://www.liquid.ai/company/newsroom  , summarize their titles                                                                                          
                                                                                                                                                                                              

 The user asks: "Can you check the latest news from https://www.liquid.ai/company/newsroom, summarize their titles?" We need to retrieve the page content and extract titles. The appropriate 
 tool is http_fetch to retrieve the webpage. We can use strip=html2md to get readable markdown, then we can parse titles manually. Or we could use in_page_search to find headings. But the   
 request is to summarize their titles, so we need to get the page and then extract titles. Since we cannot parse HTML programmatically beyond searching, we can fetch and then manually read. 
 Let's start by fetching the page. We'll use http_fetch with default method and strip=html2md to get a readable version. Then we can read the output. However, the tool returns the raw       
 content, not parsed. We'll need to see the content. We'll fetch and then perhaps search for headings. There's no built-in parsing, but we can search for typical title tags. However, we can 
 just fetch and then manually summarize? Actually, we can use in_page_search with a search term like "h2" or "title" but that's not reliable. The best approach: fetch the page with          
 strip=html2md, then we can read the markdown and see headings. The user only wants titles; we can provide them after reading. Let's do that. We'll also need to ensure we don't exceed       
 limits. We'll set max_lines to maybe 1000. Let's fetch.                                                                                                                                      

                                                                                                                                                                                              
 http_fetch                                                                                                                                                                                   
 HTTP 200 https://www.liquid.ai/company/newsroom                                                                                                                                              
 Content-Type: text/html; charset=utf-8                                                                                                                                                       
 Raw size: 205.3KB (210124 chars)                                                                                                                                                             
 Strip: html2md → 33.8KB (34443 chars)                                                                                                                                                        
 Lines: 989              
  snip

kulminaator

May 29

It still fails to call the tool sometimes, so you need to lead it such cases with more convincing.

                                                                                                                                                                                              
 ✓ New session started                                                                                                                                                                        
                                                                                                                                                                                              

                                                                                                                                                                                              
 Write into joke.txt a joke about elephants.                                                                                                                                                  
                                                                                                                                                                                              

 The user wants a joke about elephants written to a file named joke.txt. I need to create that file with the joke content. The appropriate tool is write, which creates the file if it        
 doesn't exist and writes the given content. I will provide a simple, original elephant joke as the content. No other steps are required.                                                     

                                                                                                                                                                                              
 write joke.txt                                                                                                                                                                               
                                                                                                                                                                                              
 Why did the elephant bring a ladder to the bar? Because he heard the drinks were on the house!                                                                                               
                                                                                                                                                                                              

 The joke has been written to joke.txt:                                                                                                                                                       
                                                                                                                                                                                              
 "Why did the elephant bring a ladder to the bar? Because he heard the drinks were on the house!"                                                                                             
                                                                                                                                                                                              
 Let me know if you'd like to add, modify, or remove anything!                                                                                                                                

                                                                                                                                                                                              
 I have changed the file, guess how many times it has the word cat in it now.                                                                                                                 
                                                                                                                                                                                              

 To determine how many times the word "cat" appears in joke.txt, I first need to read its current contents. The read tool can retrieve the file's text, which I can then analyze to count the 
 occurrences of "cat".                                                                                                                                                                        
                                                                                                                                                                                              
 Please provide the content of joke.txt, and I'll count the instances for you.                                                                                                                

                                                                                                                                                                                              
 Use the bash tool and grep command to count it.                                                                                                                                              
                                                                                                                                                                                              

                                                                                                                                                                                              
 read joke.txt                                                                                                                                                                                
                                                                                                                                                                                              

                                                                                                                                                                                              
 $ grep -o -i 'cat' joke.txt | wc -l                                                                                                                                                          
                                                                                                                                                                                              
 0                                                                                                                                                                                            
                                                                                                                                                                                              
 Took 0.0s                                                                                                                                                                                    
                                                                                                                                                                                              

 The word "cat" (case-insensitive) does not appear in the joke at joke.txt. The count is 0.                                                                                                   
                                                                                                                                                                                              
 Since the file contains:                                                                                                                                                                     
 Why did the elephant bring a ladder to the bar? Because he heard the drinks were on the house!                                                                                               
                                                                                                                                                                                              
 There are no instances of "cat".

ayylmaonade

May 30

•

edited May 30

@ayylmaonade I read the model card and never once role played w/ AI, but I understand why you made these assumption since they clearly stated their focus was agentic vs knowledge.

However, all newer consumer hardware OS models are trading tons of broad knowledge and abilities for moderate gains in just handful of select domains, primarily math, coding, and function calling, transforming general purpose AI models into tools that are useless to 99% of the population.

This is understandable at 300m parameters, but this model is 30x larger and trained on 38T tokens so it shouldn't have vastly less broad knowledge than Mistral 7b, Llama 3 8b, Gemma 2 9b etc. And for what? No sane person would ever use such a small and weak model to automate any task. The result will always be too riddled with errors to achieve a net positive effect. Same goes for high precision tasks like coding and math. The test scores may look impressive but the frequency of boneheaded errors is through the roof so you still need to double check even simple math and fix constant coding errors, so anyone who's even moderately competent would never use such a model for math or coding. And anyone who isn't competent will be even worse off because they will be unable to filter out the flood of boneheaded errors.

Anyways, my point is that the most efficient use of parameters is broad balanced training, especially since current AI models are sloppy (statistical next token predictors), so trading general performance and knowledge to train obsessively on a handful of high precision tasks like function calling, coding, and math makes no sense, since in the end the model still lacks the precision that coding, math, and function calling require to achieve a net positive effect.

I actually completely agree regarding AI labs focusing too much on coding/almost exclusively STEM related data. GPT-OSS or Phi 3 (perhap 4? I can't remember now, it was like mid 2025) were the absolute worst for this. GPT-OSS-20B couldn't even answer a basic question about Star Wars accurately, and that's got 4B active params. Anyway, not to tangent too much - I think you're confused about how training tokens actually translate to params, especially in MoE's. I was having a look at their non-hallucination rate on AA, and it's 63%. That's actually incredibly good for a model this size and even exceeds Qwen 3.5 + 3.6 35B, which afaik were the previous bests at something like ~50%. It does have quite a bit lower accuracy though, and I suspect it has a lot more to do with it only having 1B active params than it does with the amount of training tokens they used.

Their 24B MoE that was released a few months ago is far superior at general knowledge and it's a 3B active if I remember right. I'd betupping the active params, even slightly, would actually resolve a lot of these issues. There seems to be a sort of "barrier" regarding general knowledge and coherence around the 3-4B range ime. So yeah, I think we're somewhere in the middle on this. I really wish labs would start incorporating more general knowledge (Mistral seems to have kept this up, but they also have only updated their pretraining once in the past 3 years I believe), but I do think it's a little bit silly to be expecting good results out of a model this size and with a specified focus on agentic stuff - as unfortunate or fortunate as that may or may not be

phil111

May 30

@ayylmaonade I couldn't agree more about GPT-OSS-20b being profoundly ignorant. It scored lower than Llama 3.2 3b on my general knowledge test, and it also performed poorly for its size at most non-STEM tasks, including story writing.

If you ask me OpenAI was trying to deliberately sabotage OS AI models that can run on consumer hardware in order to protect their subscription based business model. That is, they wanted to nudge the community into competing to make AI tools for coding nerds, but which were otherwise little more than hallucination generators for the >95% of the population who aren't coding nerds. This works out because coders aren't going to waste dozens of hours debugging sub-par code by not using SOTA models like GPT or Opus, so in the end no demographic is lost to OS AI models running on consumer hardware.

Anyways, in regards to MOE vs dense LLMs, in theory MOEs should not only have nearly as much total knowledge as dense LLMs, but their long form memorization should be better, and that's what I observed. For example, Gemma 4 and Qwen3.5 MOE can recall more of a popular song's lyrics or an exact quote than their dense counterparts, while the dense versions did a little better at individual fact retrieval (e.g. Who played Monica's father in the TV show Friends?). Another example is Mixtral 8x7b vs Mistral 7b. They were trained on the same corpus, yet the total knowledge of Mixtral 8x7b is far greater than Mistral 7b, and only about a generation lower than the 70b- dense leader Llama 3.1 70b. So at least in theory the total knowledge capacity of an MOE is the same as for a dense model with an equal number of total parameters. But in practice imbalanced experts, bad routing (hence less reliable fact retrieval), and so on still means most dense versions still have a little more knowledge, but with inferior long form knowledge like song lyrics and quotes.

BingoBird

May 31

Yes, the field's focus has moved to increased knowledge resolution in high-value domains, such as science and coding.

Broad world knowledge with all the trivia you care about doesn't begin to resolve until something like Qwen235b or similar size.

We may someday achieve runtime-pluggable domain-experts, whereby a base language-model can be bound to domain-specific expertise in a modular way, reducing RAM consumed by knowledge in domains user is not interested-in.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment