Instructions to use LiquidAI/LFM2-8B-A1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LiquidAI/LFM2-8B-A1B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LiquidAI/LFM2-8B-A1B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2-8B-A1B") model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2-8B-A1B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use LiquidAI/LFM2-8B-A1B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LiquidAI/LFM2-8B-A1B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2-8B-A1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/LiquidAI/LFM2-8B-A1B
- SGLang
How to use LiquidAI/LFM2-8B-A1B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2-8B-A1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2-8B-A1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2-8B-A1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2-8B-A1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use LiquidAI/LFM2-8B-A1B with Docker Model Runner:
docker model run hf.co/LiquidAI/LFM2-8B-A1B
Enjoying this one in multi-user chat. + laptop perf
We are enjoying this model in multi-user chat, particularly the relative absence of mind-blottingly dominant 'assistant' or 'personal moral counselor' behavior ruts.
On a Thinkpad T495 with 16GB, integrated Vega8 GPU.
Side processes: no browser, no media playback, light perf mon and terminal-based chat clients:
$ llama-bench -m LFM2-8B-A1B-US-Q5_K_XL.gguf -t 5 -p 512 -n 128 -ngl 0,30,99 2> /dev/null > myresults.txt
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| lfm2moe 8B.A1B Q5_K - Medium | 5.51 GiB | 8.34 B | Vulkan | 0 | 5 | pp512 | 78.87 ± 0.50 |
| lfm2moe 8B.A1B Q5_K - Medium | 5.51 GiB | 8.34 B | Vulkan | 0 | 5 | tg128 | 14.11 ± 0.12 |
| lfm2moe 8B.A1B Q5_K - Medium | 5.51 GiB | 8.34 B | Vulkan | 30 | 5 | pp512 | 109.15 ± 0.85 |
| lfm2moe 8B.A1B Q5_K - Medium | 5.51 GiB | 8.34 B | Vulkan | 30 | 5 | tg128 | 17.09 ± 0.08 |
| lfm2moe 8B.A1B Q5_K - Medium | 5.51 GiB | 8.34 B | Vulkan | 99 | 5 | pp512 | 108.31 ± 0.73 |
| lfm2moe 8B.A1B Q5_K - Medium | 5.51 GiB | 8.34 B | Vulkan | 99 | 5 | tg128 | 17.08 ± 0.04 |
$ llama-bench -m LFM2-8B-A1B-Q4_K_S.gguf -t 5 -p 512 -n 128 -ngl 0,30,99 2> /dev/null >> myresults.txt
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| lfm2moe 8B.A1B Q4_K - Small | 4.42 GiB | 8.34 B | Vulkan | 0 | 5 | pp512 | 84.27 ± 0.59 |
| lfm2moe 8B.A1B Q4_K - Small | 4.42 GiB | 8.34 B | Vulkan | 0 | 5 | tg128 | 17.75 ± 0.09 |
| lfm2moe 8B.A1B Q4_K - Small | 4.42 GiB | 8.34 B | Vulkan | 30 | 5 | pp512 | 111.52 ± 0.91 |
| lfm2moe 8B.A1B Q4_K - Small | 4.42 GiB | 8.34 B | Vulkan | 30 | 5 | tg128 | 22.48 ± 0.08 |
| lfm2moe 8B.A1B Q4_K - Small | 4.42 GiB | 8.34 B | Vulkan | 99 | 5 | pp512 | 111.47 ± 0.95 |
| lfm2moe 8B.A1B Q4_K - Small | 4.42 GiB | 8.34 B | Vulkan | 99 | 5 | tg128 | 22.46 ± 0.08 |
$ llama-bench -m llama-2-7b.Q4_K_M.gguf -t 5 -p 512 -n 128 -ngl 0,30,99 2> /dev/null >> myresults.txt
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | Vulkan | 0 | 5 | pp512 | 29.30 ± 0.09 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | Vulkan | 0 | 5 | tg128 | 4.96 ± 0.06 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | Vulkan | 30 | 5 | pp512 | 30.17 ± 0.24 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | Vulkan | 30 | 5 | tg128 | 4.80 ± 0.01 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | Vulkan | 99 | 5 | pp512 | 30.57 ± 0.03 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | Vulkan | 99 | 5 | tg128 | 5.08 ± 0.01 |
$ llama-bench -m Qwen3-4B-Instruct-2507-Q4_K_M.gguf -t 5 -p 512 -n 128 -ngl 0,30,99 2> /dev/null >> myresults.txt
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| qwen3 4B Q4_K - Medium | 2.32 GiB | 4.02 B | Vulkan | 0 | 5 | pp512 | 50.29 ± 0.28 |
| qwen3 4B Q4_K - Medium | 2.32 GiB | 4.02 B | Vulkan | 0 | 5 | tg128 | 5.65 ± 0.15 |
| qwen3 4B Q4_K - Medium | 2.32 GiB | 4.02 B | Vulkan | 30 | 5 | pp512 | 50.94 ± 0.09 |
| qwen3 4B Q4_K - Medium | 2.32 GiB | 4.02 B | Vulkan | 30 | 5 | tg128 | 6.78 ± 0.03 |
| qwen3 4B Q4_K - Medium | 2.32 GiB | 4.02 B | Vulkan | 99 | 5 | pp512 | 53.77 ± 0.21 |
| qwen3 4B Q4_K - Medium | 2.32 GiB | 4.02 B | Vulkan | 99 | 5 | tg128 | 7.55 ± 0.01 |
$ llama-bench -m granite-4.0-h-tiny-Q4_K_M.gguf -t 5 -p 512 -n 128 -ngl 0,30,99 2> /dev/null >> myresults.txt
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| granitehybrid 1B Q4_K - Medium | 3.96 GiB | 6.94 B | Vulkan | 0 | 5 | pp512 | 61.95 ± 1.04 |
| granitehybrid 1B Q4_K - Medium | 3.96 GiB | 6.94 B | Vulkan | 0 | 5 | tg128 | 8.63 ± 0.04 |
| granitehybrid 1B Q4_K - Medium | 3.96 GiB | 6.94 B | Vulkan | 30 | 5 | pp512 | 47.91 ± 0.17 |
| granitehybrid 1B Q4_K - Medium | 3.96 GiB | 6.94 B | Vulkan | 30 | 5 | tg128 | 11.26 ± 0.13 |
| granitehybrid 1B Q4_K - Medium | 3.96 GiB | 6.94 B | Vulkan | 99 | 5 | pp512 | 90.32 ± 2.17 |
| granitehybrid 1B Q4_K - Medium | 3.96 GiB | 6.94 B | Vulkan | 99 | 5 | tg128 | 13.36 ± 0.05 |
build: e1f15b454 (7502)
This MoE model is in another league compared to other models runnable on this laptop.
It's a truly great gift to everyone. Thank you.
Awesome, thanks a lot for your message! We're working on new models with the LFM2.5 generation. I hope you'll like it. :)
I would love to see a scale-up from this model with more mid/late attention layers. I think there's a lot of room for powerful edge sMoEs between 12-24B parameters.