Instructions to use LLM-OS-Models/Fabliq-8B-Agent-Reasoning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LLM-OS-Models/Fabliq-8B-Agent-Reasoning with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LLM-OS-Models/Fabliq-8B-Agent-Reasoning")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("LLM-OS-Models/Fabliq-8B-Agent-Reasoning")
model = AutoModelForCausalLM.from_pretrained("LLM-OS-Models/Fabliq-8B-Agent-Reasoning")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use LLM-OS-Models/Fabliq-8B-Agent-Reasoning with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LLM-OS-Models/Fabliq-8B-Agent-Reasoning"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LLM-OS-Models/Fabliq-8B-Agent-Reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/LLM-OS-Models/Fabliq-8B-Agent-Reasoning

SGLang

How to use LLM-OS-Models/Fabliq-8B-Agent-Reasoning with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LLM-OS-Models/Fabliq-8B-Agent-Reasoning" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LLM-OS-Models/Fabliq-8B-Agent-Reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LLM-OS-Models/Fabliq-8B-Agent-Reasoning" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LLM-OS-Models/Fabliq-8B-Agent-Reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use LLM-OS-Models/Fabliq-8B-Agent-Reasoning with Docker Model Runner:
```
docker model run hf.co/LLM-OS-Models/Fabliq-8B-Agent-Reasoning
```

Fabliq-8B-Agent-Reasoning / README.md

gyung

Upload README.md with huggingface_hub

b9db172 verified 11 days ago

preview code

Raw

History Blame Contribute Delete

7.33 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model:
	- LLM-OS-Models/Fabliq-8B-Agent
	- LLM-OS-Models/LFM2.5-8B-A1B-Terminal-ToolBench-Full-SFT-1Epoch
	- LiquidAI/LFM2.5-8B-A1B
	base_model_relation: finetune
	tags:
	- lfm
	- liquid-ai
	- moe
	- agentic
	- tool-use
	- terminal
	- reasoning
	- fable-5
	- withinus
	- helio
	- distillation
	- sft
	language:
	- en
	pipeline_tag: text-generation
	---

	# Fabliq-8B-Agent-Reasoning 🌊🧠

	> The reasoning-expanded sibling of [Fabliq-8B-Agent](https://huggingface.co/LLM-OS-Models/Fabliq-8B-Agent). Adds general + deep reasoning on top of the agentic foundation — broadens the model beyond pure terminal tool-use into multi-domain expert Q&A, mathematical reasoning, scientific analysis, and cybersecurity. Two-phase curriculum inspired by [Qwythos-9B](https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M).

	## ✨ Why Fabliq-Reasoning?

	- 🐠 Same tiny footprint, broader reach. Inherits LFM2.5-8B-A1B's MoE efficiency (~1B active params). Now also handles expert Q&A, math, science — not just terminal work.
	- 🛠 Still agentic. Phase-1 tool-use capability is preserved — the model still reasons in `<think>` and emits native LFM tool calls when needed.
	- 🧠 Multi-domain reasoning. Trained on WithinUs (6 categories: advanced coding, agentic planning, general QA, math reasoning, scientific analysis, cybersecurity) + Helio (Opus 4.8 deep-reasoning distillation).
	- 🎯 2-phase curriculum. Phase-1 broad agentic distillation → Phase-2 focused reasoning expansion (Qwythos pattern).

	## 🧪 Model details

	\| \| \|
	\| --- \| --- \|
	\| Architecture \| Lfm2MoeForCausalLM (24 layers, 32 experts, 4 experts/token) \|
	\| Parameters \| ~8B total / ~1B active (MoE) \|
	\| Context \| 8,192 trained · 128K native (`rope_theta=5e6`) \|
	\| Precision \| bfloat16 \|
	\| Fine-tune type \| Full-parameter SFT, continuation from Fabliq-8B-Agent \|
	\| License \| Apache 2.0 \|

	## 📚 Training data (Phase-2 only)

	\| Source \| Rows \| Description \|
	\| --- \| --- \| --- \|
	\| WithinUs (from `claude_mythos_distilled_25k`) \| 135 \| 6-category expert Q&A — coding, planning, math, science, cybersecurity. SHA-256 dedup (25k → 135 unique). \|
	\| Helio (`Fable-5-Distill-Reasoning-462x`) \| 146 \| Opus 4.8 deep-reasoning traces. Russian-language filter (Cyrillic <30%). \|
	\| Total Phase-2 \| 281 \| \|

	Preprocessing:
	- WithinUs: Category-balanced (max 350/cat), SHA-256 dedup, "Drawing from the autonomous..." template first-sentence removal → [`build_withinus_lfm_sft.py`](https://github.com/LLM-OS-Models/Terminal/blob/main/fable_distillation/build_withinus_lfm_sft.py)
	- Helio: Cyrillic ratio filter (<30%), `<think>` wrapping for reasoning, line 192 corruption skip → [`build_helio_lfm_sft.py`](https://github.com/LLM-OS-Models/Terminal/blob/main/fable_distillation/build_helio_lfm_sft.py)
	- Combined: [`build_phase2_reasoning`](https://github.com/LLM-OS-Models/Terminal/blob/main/fable_distillation/datasets/) (concat)

	## 🔧 Training procedure (Phase-2)

	\| Hyperparameter \| Value \|
	\| --- \| --- \|
	\| Base \| `LLM-OS-Models/Fabliq-8B-Agent` (Phase-1 final) \|
	\| Schedule \| 4 epochs, constant LR \|
	\| Max sequence length \| 8,192 \|
	\| Per-device batch size \| 2 \|
	\| Gradient accumulation \| 4 \|
	\| GPUs \| 8× H200 (effective batch 64) \|
	\| Learning rate \| 3e-7 (lower than Phase-1 — model already agentic-tuned, avoid forgetting) \|
	\| Precision \| bf16 \|
	\| FSDP \| `full_shard`, activation checkpointing, `Lfm2MoeDecoderLayer` auto-wrap \|
	\| Final train_loss \| ~1.6 \|
	\| Train runtime \| ~6 minutes (281 rows × 4 epochs) \|
	\| Global steps \| 20 \|

	## 💬 System prompts (per data source)

	WithinUs (broad reasoning):
	```
	You are a knowledgeable assistant. Provide rigorous, well-structured answers
	across coding, cybersecurity, mathematics, scientific analysis, agentic planning,
	and general expert topics. Be precise and thorough.
	```

	Helio (deep reasoning):
	```
	You are a deep-reasoning assistant. Think step by step inside <think>...</think>,
	then provide a clear, structured answer.
	```

	## 🚀 How to use

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "LLM-OS-Models/Fabliq-8B-Agent-Reasoning"
	tok = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id, dtype=torch.bfloat16, device_map="auto"
	)

	SYSTEM = (
	"You are a deep-reasoning assistant. Think step by step inside <think>...</think>, "
	"then provide a clear, structured answer."
	)

	messages = [
	{"role": "system", "content": SYSTEM},
	{"role": "user", "content": "Derive the time complexity of merge sort and explain when it beats quicksort."},
	]
	text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tok(text, return_tensors="pt").to(model.device)

	out = model.generate(
	**inputs,
	max_new_tokens=2048,
	do_sample=False,
	repetition_penalty=1.05,
	)
	print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))
	```

	## 🎯 When to use which Fabliq?

	\| Use case \| Model \|
	\| --- \| --- \|
	\| Pure terminal / coding agent (read, edit, run, verify) \| [Fabliq-8B-Agent](https://huggingface.co/LLM-OS-Models/Fabliq-8B-Agent) \|
	\| Multi-domain expert Q&A + reasoning + still agentic \| Fabliq-8B-Agent-Reasoning (this model) \|
	\| Local 16GB VRAM deployment with tool-use \| Either — both fit comfortably \|

	## ⚠️ Limitations

	- Phase-2 dataset is small (281 rows). Reasoning expansion is real but bounded — this is a delta on top of Phase-1, not a from-scratch reasoning model.
	- WithinUs dedup surprise. Source dataset claims 25k rows but after SHA-256 dedup of templated prompts, only 135 unique rows remain. Template overfitting in the source data was severe.
	- Helio Russian filter. Original 462 rows filtered to 146 rows after removing Cyrillic-dominant (Russian) traces. Non-English coverage is limited.
	- No safety alignment. Trained on raw reasoning traces; add your own guardrails for production.
	- Max seq 8,192 at training. Behavior beyond 8K context is unverified.
	- English-centric.

	## 📜 License

	Apache 2.0, inherited from the [LiquidAI LFM2.5-8B-A1B](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B) base.

	## 🌳 Model tree

	This is a fine-tune (continuation SFT). Direct parent: [`LLM-OS-Models/Fabliq-8B-Agent`](https://huggingface.co/LLM-OS-Models/Fabliq-8B-Agent).

	```
	LiquidAI/LFM2.5-8B-A1B (LiquidAI base)
	└─ LLM-OS-Models/LFM2.5-8B-A1B-Terminal-ToolBench-Full-SFT-1Epoch (ToolBench foundation)
	└─ LLM-OS-Models/Fabliq-8B-Agent (Phase-1: Fable-5 agentic SFT)
	└─ LLM-OS-Models/Fabliq-8B-Agent-Reasoning ← this model (Phase-2: + WithinUs + Helio)
	```

	## 🙏 Acknowledgements

	- Base: [LiquidAI/LFM2.5-8B-A1B](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B)
	- Phase-1 parent: [Fabliq-8B-Agent](https://huggingface.co/LLM-OS-Models/Fabliq-8B-Agent)
	- Phase-2 data: WithinUs (claude_mythos_distilled_25k), Helio (Fable-5-Distill-Reasoning-462x)
	- Reference: [empero-ai/Qwythos-9B-Claude-Mythos-5-1M](https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M) — 2-phase curriculum pattern