Instructions to use LLM-OS-Models/Fabliq-8B-Agent-Reasoning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LLM-OS-Models/Fabliq-8B-Agent-Reasoning with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LLM-OS-Models/Fabliq-8B-Agent-Reasoning")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("LLM-OS-Models/Fabliq-8B-Agent-Reasoning")
model = AutoModelForCausalLM.from_pretrained("LLM-OS-Models/Fabliq-8B-Agent-Reasoning")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use LLM-OS-Models/Fabliq-8B-Agent-Reasoning with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LLM-OS-Models/Fabliq-8B-Agent-Reasoning"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LLM-OS-Models/Fabliq-8B-Agent-Reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/LLM-OS-Models/Fabliq-8B-Agent-Reasoning

SGLang

How to use LLM-OS-Models/Fabliq-8B-Agent-Reasoning with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LLM-OS-Models/Fabliq-8B-Agent-Reasoning" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LLM-OS-Models/Fabliq-8B-Agent-Reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LLM-OS-Models/Fabliq-8B-Agent-Reasoning" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LLM-OS-Models/Fabliq-8B-Agent-Reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use LLM-OS-Models/Fabliq-8B-Agent-Reasoning with Docker Model Runner:
```
docker model run hf.co/LLM-OS-Models/Fabliq-8B-Agent-Reasoning
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Fabliq-8B-Agent-Reasoning 🌊🧠

The reasoning-expanded sibling of Fabliq-8B-Agent. Adds general + deep reasoning on top of the agentic foundation — broadens the model beyond pure terminal tool-use into multi-domain expert Q&A, mathematical reasoning, scientific analysis, and cybersecurity. Two-phase curriculum inspired by Qwythos-9B.

✨ Why Fabliq-Reasoning?

🐠 Same tiny footprint, broader reach. Inherits LFM2.5-8B-A1B's MoE efficiency (~1B active params). Now also handles expert Q&A, math, science — not just terminal work.
🛠 Still agentic. Phase-1 tool-use capability is preserved — the model still reasons in <think> and emits native LFM tool calls when needed.
🧠 Multi-domain reasoning. Trained on WithinUs (6 categories: advanced coding, agentic planning, general QA, math reasoning, scientific analysis, cybersecurity) + Helio (Opus 4.8 deep-reasoning distillation).
🎯 2-phase curriculum. Phase-1 broad agentic distillation → Phase-2 focused reasoning expansion (Qwythos pattern).

🧪 Model details


Architecture	Lfm2MoeForCausalLM (24 layers, 32 experts, 4 experts/token)
Parameters	~8B total / ~1B active (MoE)
Context	8,192 trained · 128K native (`rope_theta=5e6`)
Precision	bfloat16
Fine-tune type	Full-parameter SFT, continuation from Fabliq-8B-Agent
License	Apache 2.0

📚 Training data (Phase-2 only)

Source	Rows	Description
WithinUs (from `claude_mythos_distilled_25k`)	135	6-category expert Q&A — coding, planning, math, science, cybersecurity. SHA-256 dedup (25k → 135 unique).
Helio (`Fable-5-Distill-Reasoning-462x`)	146	Opus 4.8 deep-reasoning traces. Russian-language filter (Cyrillic <30%).
Total Phase-2	281

Preprocessing:

WithinUs: Category-balanced (max 350/cat), SHA-256 dedup, "Drawing from the autonomous..." template first-sentence removal → build_withinus_lfm_sft.py
Helio: Cyrillic ratio filter (<30%), <think> wrapping for reasoning, line 192 corruption skip → build_helio_lfm_sft.py
Combined: build_phase2_reasoning (concat)

🔧 Training procedure (Phase-2)

Hyperparameter	Value
Base	`LLM-OS-Models/Fabliq-8B-Agent` (Phase-1 final)
Schedule	4 epochs, constant LR
Max sequence length	8,192
Per-device batch size	2
Gradient accumulation	4
GPUs	8× H200 (effective batch 64)
Learning rate	3e-7 (lower than Phase-1 — model already agentic-tuned, avoid forgetting)
Precision	bf16
FSDP	`full_shard`, activation checkpointing, `Lfm2MoeDecoderLayer` auto-wrap
Final train_loss	~1.6
Train runtime	~6 minutes (281 rows × 4 epochs)
Global steps	20

💬 System prompts (per data source)

WithinUs (broad reasoning):

You are a knowledgeable assistant. Provide rigorous, well-structured answers
across coding, cybersecurity, mathematics, scientific analysis, agentic planning,
and general expert topics. Be precise and thorough.

Helio (deep reasoning):

You are a deep-reasoning assistant. Think step by step inside <think>...</think>,
then provide a clear, structured answer.

🚀 How to use

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "LLM-OS-Models/Fabliq-8B-Agent-Reasoning"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, dtype=torch.bfloat16, device_map="auto"
)

SYSTEM = (
    "You are a deep-reasoning assistant. Think step by step inside <think>...</think>, "
    "then provide a clear, structured answer."
)

messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": "Derive the time complexity of merge sort and explain when it beats quicksort."},
]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(text, return_tensors="pt").to(model.device)

out = model.generate(
    **inputs,
    max_new_tokens=2048,
    do_sample=False,
    repetition_penalty=1.05,
)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))

🎯 When to use which Fabliq?

Use case	Model
Pure terminal / coding agent (read, edit, run, verify)	Fabliq-8B-Agent
Multi-domain expert Q&A + reasoning + still agentic	Fabliq-8B-Agent-Reasoning (this model)
Local 16GB VRAM deployment with tool-use	Either — both fit comfortably

⚠️ Limitations

Phase-2 dataset is small (281 rows). Reasoning expansion is real but bounded — this is a delta on top of Phase-1, not a from-scratch reasoning model.
WithinUs dedup surprise. Source dataset claims 25k rows but after SHA-256 dedup of templated prompts, only 135 unique rows remain. Template overfitting in the source data was severe.
Helio Russian filter. Original 462 rows filtered to 146 rows after removing Cyrillic-dominant (Russian) traces. Non-English coverage is limited.
No safety alignment. Trained on raw reasoning traces; add your own guardrails for production.
Max seq 8,192 at training. Behavior beyond 8K context is unverified.
English-centric.

📜 License

Apache 2.0, inherited from the LiquidAI LFM2.5-8B-A1B base.

🌳 Model tree

This is a fine-tune (continuation SFT). Direct parent: LLM-OS-Models/Fabliq-8B-Agent.

LiquidAI/LFM2.5-8B-A1B                                          (LiquidAI base)
  └─ LLM-OS-Models/LFM2.5-8B-A1B-Terminal-ToolBench-Full-SFT-1Epoch  (ToolBench foundation)
      └─ LLM-OS-Models/Fabliq-8B-Agent                           (Phase-1: Fable-5 agentic SFT)
          └─ LLM-OS-Models/Fabliq-8B-Agent-Reasoning             ← this model (Phase-2: + WithinUs + Helio)

🙏 Acknowledgements

Base: LiquidAI/LFM2.5-8B-A1B
Phase-1 parent: Fabliq-8B-Agent
Phase-2 data: WithinUs (claude_mythos_distilled_25k), Helio (Fable-5-Distill-Reasoning-462x)
Reference: empero-ai/Qwythos-9B-Claude-Mythos-5-1M — 2-phase curriculum pattern