Instructions to use LiquidAI/LFM2-1.2B-RAG with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LiquidAI/LFM2-1.2B-RAG with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LiquidAI/LFM2-1.2B-RAG")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2-1.2B-RAG")
model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2-1.2B-RAG")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use LiquidAI/LFM2-1.2B-RAG with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LiquidAI/LFM2-1.2B-RAG"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2-1.2B-RAG",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/LiquidAI/LFM2-1.2B-RAG

SGLang

How to use LiquidAI/LFM2-1.2B-RAG with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LiquidAI/LFM2-1.2B-RAG" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2-1.2B-RAG",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LiquidAI/LFM2-1.2B-RAG" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LiquidAI/LFM2-1.2B-RAG",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use LiquidAI/LFM2-1.2B-RAG with Docker Model Runner:
```
docker model run hf.co/LiquidAI/LFM2-1.2B-RAG
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Try LFM • Docs • LEAP • Discord

LFM2-1.2B-RAG

Based on LFM2-1.2B, LFM2-1.2B-RAG is specialized in answering questions based on provided contextual documents, for use in RAG (Retrieval-Augmented Generation) systems.

Use cases:

Chatbot to ask questions about the documentation of a particular product.
Custom support with an internal knowledge base to provide grounded answers.
Academic research assistant with multi-turn conversations about research papers and course materials.

You can find more information about other task-specific models in this blog post.

📄 Model details

Generation parameters: We recommend using greedy decoding with a temperature=0.

System prompt: The system prompt is optional. You can force the output's language, for example, using "Always respond in English, regardless of the user's input language." By default, the output's language follows the user prompt's language.

Supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, and Spanish.

Training approach: We fine-tuned the LFM2-1.2B-RAG model on a dataset that includes 1M+ samples of multi-turn interactions and multi-document samples consisting of a mix of curated open source documents as well as generated synthetic ones.

Chat template: LFM2 uses a ChatML-like chat template as follows:

<|startoftext|><|im_start|>user
Use the following context to answer questions:
Beach soccer differs significantly from its grass-rooted counterpart. [...]<|im_end|>
<|im_start|>assistant
Each team in a beach soccer match consists of five players, including a goalkeeper.{<|im_end|>

You can automatically apply it using the dedicated .apply_chat_template() function from Hugging Face transformers.

⚠️ The model supports both single-turn and multi-turn conversations.

RAG systems enable AI solutions to include new, up-to-date, and potentially proprietary information in LLM responses that was not present in the training data. When a user asks a question, the retrieval component locates and delivers related documents from a knowledge base, and then the RAG generator model answers the question based on facts from those contextual documents.

🏃 How to run

Hugging Face: LFM2-1.2B
llama.cpp: LFM2-1.2B-Extract-GGUF
LEAP: LEAP model library

You can use the following Colab notebooks for easy inference and fine-tuning:

Notebook	Description	Link
Inference	Run the model with Hugging Face's transformers library.
SFT (TRL)	Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using TRL.
DPO (TRL)	Preference alignment with Direct Preference Optimization (DPO) using TRL.
SFT (Axolotl)	Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using Axolotl.
SFT (Unsloth)	Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using Unsloth.

📬 Contact

Got questions or want to connect? Join our Discord community
If you are interested in custom solutions with edge deployment, please contact our sales team.

Citation

@article{liquidai2025lfm2,
 title={LFM2 Technical Report},
 author={Liquid AI},
 journal={arXiv preprint arXiv:2511.23404},
 year={2025}
}

Downloads last month: 696

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for LiquidAI/LFM2-1.2B-RAG

Base model

LiquidAI/LFM2-1.2B

Finetuned

(64)

this model

Finetunes

3 models

Quantizations

8 models

Spaces using LiquidAI/LFM2-1.2B-RAG 3

Collection including LiquidAI/LFM2-1.2B-RAG

🎯 Liquid Nanos

Collection

Library of task-specific models: https://www.liquid.ai/blog/introducing-liquid-nanos-frontier-grade-performance-on-everyday-devices • 26 items • Updated Apr 8 • 114

Paper for LiquidAI/LFM2-1.2B-RAG

LFM2 Technical Report

Paper • 2511.23404 • Published Nov 28, 2025 • 59