Fabliq-8B-Agent 🌊

Fab·liq = Fable + Liquid. A compact, fast agentic terminal coding model fine-tuned from LLM-OS-Models/LFM2.5-8B-A1B-Terminal-ToolBench-Full-SFT-1Epoch on real Claude Code sessions from the Fable-5-traces dataset. The base LiquidAI LFM2.5-8B-A1B is an 8B MoE (~1B active), so Fabliq inherits the speed and low VRAM of MoE inference plus the agentic distillation.

Fabliq thinks before it acts: it reads the conversation, reasons inside <think>...</think>, then either calls a tool with LFM's native tool-call format or replies with text. Trained on 4,047 real multi-turn terminal trajectories (Bash, Edit, Read, Write, Glob, Grep, WebSearch) — the kind of read → reason → act → verify loop a real coding agent runs.

✨ Why Fabliq?

  • 🐠 Tiny footprint, agent-class behavior. LFM2.5-8B-A1B is a Mixture-of-Experts model — only ~1B parameters activate per token. That means fast inference, low VRAM, and the agentic distillation still takes.
  • 🛠 Native tool calling. No wrapper needed — Fabliq emits <|tool_call_start|>[ToolName(arg=value)]<|tool_call_end|> per LFM's official format. Plug it into a harness that parses and executes those calls and you have a working terminal agent.
  • 🧠 Reasoning-first. Every assistant turn opens with a <think> block — the chain-of-thought from the original Claude traces, preserved verbatim. The model self-explains before each action.
  • 🔗 Clean lineage. This is a direct fine-tune of LLM-OS-Models/LFM2.5-8B-A1B-Terminal-ToolBench-Full-SFT-1Epoch, which is itself a fine-tune of LiquidAI/LFM2.5-8B-A1B. Fabliq adds 3 epochs of Fable-5 agentic distillation on top of the ToolBench foundation.

Sibling models:

🧪 Model details

Architecture Lfm2MoeForCausalLM (24 layers, 32 experts, 4 experts/token)
Parameters ~8B total / ~1B active (MoE)
Context 8,192 trained · 128K native (rope_theta=5e6)
Precision bfloat16
Fine-tune type Full-parameter SFT (FSDP full_shard + activation checkpointing)
License Apache 2.0

📚 Training data

Source Rows Description
Glint-Research/Fable-5-traces 4,047 Real Claude Code terminal sessions — multi-turn tool-use trajectories

Preprocessing pipeline (build_fable5_to_lfm_sft.py):

  1. Parse context → multi-turn USER / ASSISTANT (message) messages
  2. Strip slash-command metadata blocks (<local-command-caveat>, <command-*>)
  3. Convert {tool, input} structured output → LFM native tool-call syntax <|tool_call_start|>[ToolName(arg='value')]<|tool_call_end|>
  4. Wrap chain-of-thought in <think>...</think> for assistant reasoning training
  5. Drop 618 rows with <3 messages after parsing
  6. Max sequence length 8,192 tokens (98.6% coverage without truncation)

Full lineage, data composition, and per-dataset metadata: see the Fable Distillation docs.

🔧 Training procedure

Hyperparameter Value
Schedule 3 epochs, constant LR
Max sequence length 8,192
Per-device batch size 2
Gradient accumulation 4
GPUs 8× H200 (effective batch 64)
Learning rate 5e-7 (AdamW)
Precision bf16
FSDP full_shard, activation checkpointing, Lfm2MoeDecoderLayer auto-wrap
Loss NLL, chat-template masked (assistant-only effective)
Final train_loss 1.277
Train runtime 831 seconds (~14 min)
Global steps 192

💬 System prompt

You are an agentic coding assistant. Read the conversation history and tool results,
think step by step inside <think>...</think>, then either call a tool using
<|tool_call_start|>[ToolName(arg=value)]<|tool_call_end|> or respond with text.
Use available tools (Bash, Edit, Read, Write, Glob, Grep, WebSearch, WebFetch, etc.)
to accomplish the user's task. Be concise but thorough.

🚀 How to use

transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "LLM-OS-Models/Fabliq-8B-Agent"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, dtype=torch.bfloat16, device_map="auto"
)

SYSTEM = (
    "You are an agentic coding assistant. Read the conversation history and tool results, "
    "think step by step inside <think>...</think>, then either call a tool using "
    "<|tool_call_start|>[ToolName(arg=value)]<|tool_call_end|> or respond with text. "
    "Use available tools (Bash, Edit, Read, Write, Glob, Grep, WebSearch, WebFetch, etc.) "
    "to accomplish the user's task. Be concise but thorough."
)

messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": "List the Python files in /tmp and report the line count of the largest one."},
]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(text, return_tensors="pt").to(model.device)

out = model.generate(
    **inputs,
    max_new_tokens=1024,
    do_sample=False,
    repetition_penalty=1.05,
)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))

Expected output — the model reasons, then emits a tool call:

<think>
I need to find Python files in /tmp. I'll use Bash with ls piped into wc -l.
</think>

<|tool_call_start|>[Bash(command='ls -1 /tmp/*.py 2>/dev/null | xargs wc -l 2>/dev/null | sort -n | tail -1')]<|tool_call_end|>

vLLM

vllm serve LLM-OS-Models/Fabliq-8B-Agent \
  --max-model-len 8192 --dtype bfloat16 --gpu-memory-utilization 0.9

🎯 Intended use

  • Local coding agent on top of an MoE-efficient backbone (~1B active params — runs comfortably on a single consumer GPU)
  • Terminal / file-system agentic loops (read, edit, run, verify)
  • Research on distillation from frontier closed models into open-weight MoE backbones

⚠️ Limitations

  • Agentic specialization. Focused fine-tune for terminal/coding work. General-knowledge benchmarks may sit slightly below the LFM2.5-8B-A1B base — that's the expected trade-off for a focused agentic distill.
  • No safety alignment. Trained on raw tool-use traces; add your own guardrails for production.
  • Tool-format lock-in. Emits LFM-native tool-call syntax. A harness that parses <|tool_call_start|>...<|tool_call_end|> and actually executes the call is required for the agentic loop to work.
  • Max seq 8,192 at training. Behavior beyond 8K context is unverified for this checkpoint.
  • English-centric.

📜 License

Apache 2.0, inherited from the LiquidAI LFM2.5-8B-A1B base.

🌳 Model tree

This is a fine-tune (not a merge or adapter). Direct parent: LLM-OS-Models/LFM2.5-8B-A1B-Terminal-ToolBench-Full-SFT-1Epoch.

LiquidAI/LFM2.5-8B-A1B                                          (LiquidAI base)
  └─ LLM-OS-Models/LFM2.5-8B-A1B-Terminal-ToolBench-Full-SFT-1Epoch  (ToolBench foundation)
      └─ LLM-OS-Models/Fabliq-8B-Agent                           ← this model (Fable-5 agentic SFT, 3 epochs)
          └─ LLM-OS-Models/Fabliq-8B-Agent-Reasoning             ← sibling (+ WithinUs + Helio reasoning)

🙏 Acknowledgements

Downloads last month
26
Safetensors
Model size
8B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LLM-OS-Models/Fabliq-8B-Agent