Instructions to use LLM-OS-Models/Fabliq-8B-Agent with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LLM-OS-Models/Fabliq-8B-Agent with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LLM-OS-Models/Fabliq-8B-Agent") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("LLM-OS-Models/Fabliq-8B-Agent") model = AutoModelForCausalLM.from_pretrained("LLM-OS-Models/Fabliq-8B-Agent") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use LLM-OS-Models/Fabliq-8B-Agent with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LLM-OS-Models/Fabliq-8B-Agent" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM-OS-Models/Fabliq-8B-Agent", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/LLM-OS-Models/Fabliq-8B-Agent
- SGLang
How to use LLM-OS-Models/Fabliq-8B-Agent with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LLM-OS-Models/Fabliq-8B-Agent" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM-OS-Models/Fabliq-8B-Agent", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LLM-OS-Models/Fabliq-8B-Agent" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM-OS-Models/Fabliq-8B-Agent", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use LLM-OS-Models/Fabliq-8B-Agent with Docker Model Runner:
docker model run hf.co/LLM-OS-Models/Fabliq-8B-Agent
Fabliq-8B-Agent 🌊
Fab·liq = Fable + Liquid. A compact, fast agentic terminal coding model fine-tuned from
LLM-OS-Models/LFM2.5-8B-A1B-Terminal-ToolBench-Full-SFT-1Epochon real Claude Code sessions from the Fable-5-traces dataset. The base LiquidAI LFM2.5-8B-A1B is an 8B MoE (~1B active), so Fabliq inherits the speed and low VRAM of MoE inference plus the agentic distillation.
Fabliq thinks before it acts: it reads the conversation, reasons inside <think>...</think>, then either calls a tool with LFM's native tool-call format or replies with text. Trained on 4,047 real multi-turn terminal trajectories (Bash, Edit, Read, Write, Glob, Grep, WebSearch) — the kind of read → reason → act → verify loop a real coding agent runs.
✨ Why Fabliq?
- 🐠 Tiny footprint, agent-class behavior. LFM2.5-8B-A1B is a Mixture-of-Experts model — only ~1B parameters activate per token. That means fast inference, low VRAM, and the agentic distillation still takes.
- 🛠 Native tool calling. No wrapper needed — Fabliq emits
<|tool_call_start|>[ToolName(arg=value)]<|tool_call_end|>per LFM's official format. Plug it into a harness that parses and executes those calls and you have a working terminal agent. - 🧠 Reasoning-first. Every assistant turn opens with a
<think>block — the chain-of-thought from the original Claude traces, preserved verbatim. The model self-explains before each action. - 🔗 Clean lineage. This is a direct fine-tune of
LLM-OS-Models/LFM2.5-8B-A1B-Terminal-ToolBench-Full-SFT-1Epoch, which is itself a fine-tune ofLiquidAI/LFM2.5-8B-A1B. Fabliq adds 3 epochs of Fable-5 agentic distillation on top of the ToolBench foundation.
Sibling models:
- Fabliq-8B-Agent-Reasoning — adds general/deep reasoning (WithinUs + Helio) on top of this base.
🧪 Model details
| Architecture | Lfm2MoeForCausalLM (24 layers, 32 experts, 4 experts/token) |
| Parameters | ~8B total / ~1B active (MoE) |
| Context | 8,192 trained · 128K native (rope_theta=5e6) |
| Precision | bfloat16 |
| Fine-tune type | Full-parameter SFT (FSDP full_shard + activation checkpointing) |
| License | Apache 2.0 |
📚 Training data
| Source | Rows | Description |
|---|---|---|
| Glint-Research/Fable-5-traces | 4,047 | Real Claude Code terminal sessions — multi-turn tool-use trajectories |
Preprocessing pipeline (build_fable5_to_lfm_sft.py):
- Parse
context→ multi-turnUSER/ASSISTANT (message)messages - Strip slash-command metadata blocks (
<local-command-caveat>,<command-*>) - Convert
{tool, input}structured output → LFM native tool-call syntax<|tool_call_start|>[ToolName(arg='value')]<|tool_call_end|> - Wrap chain-of-thought in
<think>...</think>for assistant reasoning training - Drop 618 rows with <3 messages after parsing
- Max sequence length 8,192 tokens (98.6% coverage without truncation)
Full lineage, data composition, and per-dataset metadata: see the Fable Distillation docs.
🔧 Training procedure
| Hyperparameter | Value |
|---|---|
| Schedule | 3 epochs, constant LR |
| Max sequence length | 8,192 |
| Per-device batch size | 2 |
| Gradient accumulation | 4 |
| GPUs | 8× H200 (effective batch 64) |
| Learning rate | 5e-7 (AdamW) |
| Precision | bf16 |
| FSDP | full_shard, activation checkpointing, Lfm2MoeDecoderLayer auto-wrap |
| Loss | NLL, chat-template masked (assistant-only effective) |
| Final train_loss | 1.277 |
| Train runtime | 831 seconds (~14 min) |
| Global steps | 192 |
💬 System prompt
You are an agentic coding assistant. Read the conversation history and tool results,
think step by step inside <think>...</think>, then either call a tool using
<|tool_call_start|>[ToolName(arg=value)]<|tool_call_end|> or respond with text.
Use available tools (Bash, Edit, Read, Write, Glob, Grep, WebSearch, WebFetch, etc.)
to accomplish the user's task. Be concise but thorough.
🚀 How to use
transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "LLM-OS-Models/Fabliq-8B-Agent"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id, dtype=torch.bfloat16, device_map="auto"
)
SYSTEM = (
"You are an agentic coding assistant. Read the conversation history and tool results, "
"think step by step inside <think>...</think>, then either call a tool using "
"<|tool_call_start|>[ToolName(arg=value)]<|tool_call_end|> or respond with text. "
"Use available tools (Bash, Edit, Read, Write, Glob, Grep, WebSearch, WebFetch, etc.) "
"to accomplish the user's task. Be concise but thorough."
)
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": "List the Python files in /tmp and report the line count of the largest one."},
]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(text, return_tensors="pt").to(model.device)
out = model.generate(
**inputs,
max_new_tokens=1024,
do_sample=False,
repetition_penalty=1.05,
)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))
Expected output — the model reasons, then emits a tool call:
<think>
I need to find Python files in /tmp. I'll use Bash with ls piped into wc -l.
</think>
<|tool_call_start|>[Bash(command='ls -1 /tmp/*.py 2>/dev/null | xargs wc -l 2>/dev/null | sort -n | tail -1')]<|tool_call_end|>
vLLM
vllm serve LLM-OS-Models/Fabliq-8B-Agent \
--max-model-len 8192 --dtype bfloat16 --gpu-memory-utilization 0.9
🎯 Intended use
- Local coding agent on top of an MoE-efficient backbone (~1B active params — runs comfortably on a single consumer GPU)
- Terminal / file-system agentic loops (read, edit, run, verify)
- Research on distillation from frontier closed models into open-weight MoE backbones
⚠️ Limitations
- Agentic specialization. Focused fine-tune for terminal/coding work. General-knowledge benchmarks may sit slightly below the LFM2.5-8B-A1B base — that's the expected trade-off for a focused agentic distill.
- No safety alignment. Trained on raw tool-use traces; add your own guardrails for production.
- Tool-format lock-in. Emits LFM-native tool-call syntax. A harness that parses
<|tool_call_start|>...<|tool_call_end|>and actually executes the call is required for the agentic loop to work. - Max seq 8,192 at training. Behavior beyond 8K context is unverified for this checkpoint.
- English-centric.
📜 License
Apache 2.0, inherited from the LiquidAI LFM2.5-8B-A1B base.
🌳 Model tree
This is a fine-tune (not a merge or adapter). Direct parent: LLM-OS-Models/LFM2.5-8B-A1B-Terminal-ToolBench-Full-SFT-1Epoch.
LiquidAI/LFM2.5-8B-A1B (LiquidAI base)
└─ LLM-OS-Models/LFM2.5-8B-A1B-Terminal-ToolBench-Full-SFT-1Epoch (ToolBench foundation)
└─ LLM-OS-Models/Fabliq-8B-Agent ← this model (Fable-5 agentic SFT, 3 epochs)
└─ LLM-OS-Models/Fabliq-8B-Agent-Reasoning ← sibling (+ WithinUs + Helio reasoning)
🙏 Acknowledgements
- Base model: LiquidAI/LFM2.5-8B-A1B
- Training data: Glint-Research/Fable-5-traces
- Training framework: PyTorch FSDP + 🤗 Transformers + TRL
- Reference inspirations: empero-ai/Qwable-9B-Claude-Fable-5, empero-ai/Qwythos-9B-Claude-Mythos-5-1M, yuxinlu1/gemma-4-12B-agentic-fable5-composer2.5-v2-3.5x-tau2-GGUF
- Downloads last month
- 26
Model tree for LLM-OS-Models/Fabliq-8B-Agent
Base model
LiquidAI/LFM2.5-8B-A1B-Base