Instructions to use LiquidAI/LFM2-24B-A2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LiquidAI/LFM2-24B-A2B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LiquidAI/LFM2-24B-A2B")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("LiquidAI/LFM2-24B-A2B", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use LiquidAI/LFM2-24B-A2B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LiquidAI/LFM2-24B-A2B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2-24B-A2B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/LiquidAI/LFM2-24B-A2B
- SGLang
How to use LiquidAI/LFM2-24B-A2B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2-24B-A2B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2-24B-A2B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2-24B-A2B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2-24B-A2B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use LiquidAI/LFM2-24B-A2B with Docker Model Runner:
docker model run hf.co/LiquidAI/LFM2-24B-A2B
Function Calling / Tool Use formatting issue: Model hallucinates `<tool_call>` XML tags instead of native JSON
Model: LFM2-24B-A2B-Q6_K.gguf
Environment: Roo Code (VS Code Extension) -> LM Studio (OpenAI Compatible API) -> Lfm2
Architecture context from logs: general.architecture = lfm2moe
Description:
I am evaluating Lfm2-24B-a2b for use in an autonomous coding agent environment (Roo Code). While the model demonstrates excellent reasoning capabilities, it consistently fails to properly utilize the standard OpenAI-compatible function calling mechanism.
When the /v1/chat/completions endpoint is hit with a defined tools array, the model does not trigger a native tool_calls object. Instead, it responds with plain text and hallucinates custom XML-like tags to invoke tools.
Steps to Reproduce & Observed Behavior:
- Send a request to the model with a
toolsarray containing JSON schemas for various tools (e.g.,attempt_completion,list_files). - Instruct the model to use a tool to complete the task.
- Expected: The API returns a
tool_callsarray with a valid JSON object. - Actual: The API returns standard text
contentcontaining hallucinated tags like this:
<tool_call>
{"name": "attempt_completion", "arguments": {"result": βI have displayed the list of top-level folders. The task is complete.β}}
</tool_call>
This causes a continuous error loop with the agent (Roo Code), which strictly expects native tool calls and replies with: [ERROR] You did not use a tool in your previous response! Please retry with a tool use.
LM Studio Log Snippet (Request containing tools):
Received request: POST to /v1/chat/completions with body {
"model": "liquid/lfm2-24b-a2b",
"messages": [ ... ],
"tools": [
{
"type": "function",
"function": {
"name": "attempt_completion",
"description": "After each tool use...",
"parameters": { ... }
}
}
],
"tool_choice": "auto"
}
Questions for the Team:
- Does Lfm2-24B-a2b require a specific Jinja prompt template or special tokens to correctly format JSON function calling?
- If this is a known limitation of the current instruction tuning, are there plans to align the tool-calling format with industry standards (e.g., Llama 3 or Qwen formats) in future updates?
Thank you for your hard work on this impressive model.