Text Generation
Transformers
Safetensors
English
lfm2_moe
lfm
liquid-ai
Mixture of Experts
agentic
tool-use
terminal
reasoning
fable-5
withinus
helio
distillation
sft
conversational
Instructions to use LLM-OS-Models/Fabliq-8B-Agent-Reasoning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LLM-OS-Models/Fabliq-8B-Agent-Reasoning with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LLM-OS-Models/Fabliq-8B-Agent-Reasoning") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("LLM-OS-Models/Fabliq-8B-Agent-Reasoning") model = AutoModelForCausalLM.from_pretrained("LLM-OS-Models/Fabliq-8B-Agent-Reasoning") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use LLM-OS-Models/Fabliq-8B-Agent-Reasoning with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LLM-OS-Models/Fabliq-8B-Agent-Reasoning" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM-OS-Models/Fabliq-8B-Agent-Reasoning", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/LLM-OS-Models/Fabliq-8B-Agent-Reasoning
- SGLang
How to use LLM-OS-Models/Fabliq-8B-Agent-Reasoning with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LLM-OS-Models/Fabliq-8B-Agent-Reasoning" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM-OS-Models/Fabliq-8B-Agent-Reasoning", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LLM-OS-Models/Fabliq-8B-Agent-Reasoning" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM-OS-Models/Fabliq-8B-Agent-Reasoning", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use LLM-OS-Models/Fabliq-8B-Agent-Reasoning with Docker Model Runner:
docker model run hf.co/LLM-OS-Models/Fabliq-8B-Agent-Reasoning
| library_name: transformers | |
| license: apache-2.0 | |
| base_model: | |
| - LLM-OS-Models/Fabliq-8B-Agent | |
| - LLM-OS-Models/LFM2.5-8B-A1B-Terminal-ToolBench-Full-SFT-1Epoch | |
| - LiquidAI/LFM2.5-8B-A1B | |
| base_model_relation: finetune | |
| tags: | |
| - lfm | |
| - liquid-ai | |
| - moe | |
| - agentic | |
| - tool-use | |
| - terminal | |
| - reasoning | |
| - fable-5 | |
| - withinus | |
| - helio | |
| - distillation | |
| - sft | |
| language: | |
| - en | |
| pipeline_tag: text-generation | |
| # Fabliq-8B-Agent-Reasoning ππ§ | |
| > The **reasoning-expanded sibling** of [Fabliq-8B-Agent](https://huggingface.co/LLM-OS-Models/Fabliq-8B-Agent). Adds general + deep reasoning on top of the agentic foundation β broadens the model beyond pure terminal tool-use into multi-domain expert Q&A, mathematical reasoning, scientific analysis, and cybersecurity. Two-phase curriculum inspired by [Qwythos-9B](https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M). | |
| ## β¨ Why Fabliq-Reasoning? | |
| - **π Same tiny footprint, broader reach.** Inherits LFM2.5-8B-A1B's MoE efficiency (~1B active params). Now also handles expert Q&A, math, science β not just terminal work. | |
| - **π Still agentic.** Phase-1 tool-use capability is preserved β the model still reasons in `<think>` and emits native LFM tool calls when needed. | |
| - **π§ Multi-domain reasoning.** Trained on WithinUs (6 categories: advanced coding, agentic planning, general QA, math reasoning, scientific analysis, cybersecurity) + Helio (Opus 4.8 deep-reasoning distillation). | |
| - **π― 2-phase curriculum.** Phase-1 broad agentic distillation β Phase-2 focused reasoning expansion (Qwythos pattern). | |
| ## π§ͺ Model details | |
| | | | | |
| | --- | --- | | |
| | **Architecture** | Lfm2MoeForCausalLM (24 layers, 32 experts, 4 experts/token) | | |
| | **Parameters** | ~8B total / ~1B active (MoE) | | |
| | **Context** | 8,192 trained Β· 128K native (`rope_theta=5e6`) | | |
| | **Precision** | bfloat16 | | |
| | **Fine-tune type** | Full-parameter SFT, continuation from Fabliq-8B-Agent | | |
| | **License** | Apache 2.0 | | |
| ## π Training data (Phase-2 only) | |
| | Source | Rows | Description | | |
| | --- | --- | --- | | |
| | WithinUs (from `claude_mythos_distilled_25k`) | 135 | 6-category expert Q&A β coding, planning, math, science, cybersecurity. SHA-256 dedup (25k β 135 unique). | | |
| | Helio (`Fable-5-Distill-Reasoning-462x`) | 146 | Opus 4.8 deep-reasoning traces. Russian-language filter (Cyrillic <30%). | | |
| | **Total Phase-2** | **281** | | | |
| Preprocessing: | |
| - **WithinUs:** Category-balanced (max 350/cat), SHA-256 dedup, "Drawing from the autonomous..." template first-sentence removal β [`build_withinus_lfm_sft.py`](https://github.com/LLM-OS-Models/Terminal/blob/main/fable_distillation/build_withinus_lfm_sft.py) | |
| - **Helio:** Cyrillic ratio filter (<30%), `<think>` wrapping for reasoning, line 192 corruption skip β [`build_helio_lfm_sft.py`](https://github.com/LLM-OS-Models/Terminal/blob/main/fable_distillation/build_helio_lfm_sft.py) | |
| - Combined: [`build_phase2_reasoning`](https://github.com/LLM-OS-Models/Terminal/blob/main/fable_distillation/datasets/) (concat) | |
| ## π§ Training procedure (Phase-2) | |
| | Hyperparameter | Value | | |
| | --- | --- | | |
| | Base | `LLM-OS-Models/Fabliq-8B-Agent` (Phase-1 final) | | |
| | Schedule | 4 epochs, constant LR | | |
| | Max sequence length | 8,192 | | |
| | Per-device batch size | 2 | | |
| | Gradient accumulation | 4 | | |
| | GPUs | 8Γ H200 (effective batch 64) | | |
| | Learning rate | 3e-7 (lower than Phase-1 β model already agentic-tuned, avoid forgetting) | | |
| | Precision | bf16 | | |
| | FSDP | `full_shard`, activation checkpointing, `Lfm2MoeDecoderLayer` auto-wrap | | |
| | Final train_loss | ~1.6 | | |
| | Train runtime | ~6 minutes (281 rows Γ 4 epochs) | | |
| | Global steps | 20 | | |
| ## π¬ System prompts (per data source) | |
| **WithinUs (broad reasoning):** | |
| ``` | |
| You are a knowledgeable assistant. Provide rigorous, well-structured answers | |
| across coding, cybersecurity, mathematics, scientific analysis, agentic planning, | |
| and general expert topics. Be precise and thorough. | |
| ``` | |
| **Helio (deep reasoning):** | |
| ``` | |
| You are a deep-reasoning assistant. Think step by step inside <think>...</think>, | |
| then provide a clear, structured answer. | |
| ``` | |
| ## π How to use | |
| ```python | |
| import torch | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_id = "LLM-OS-Models/Fabliq-8B-Agent-Reasoning" | |
| tok = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, dtype=torch.bfloat16, device_map="auto" | |
| ) | |
| SYSTEM = ( | |
| "You are a deep-reasoning assistant. Think step by step inside <think>...</think>, " | |
| "then provide a clear, structured answer." | |
| ) | |
| messages = [ | |
| {"role": "system", "content": SYSTEM}, | |
| {"role": "user", "content": "Derive the time complexity of merge sort and explain when it beats quicksort."}, | |
| ] | |
| text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = tok(text, return_tensors="pt").to(model.device) | |
| out = model.generate( | |
| **inputs, | |
| max_new_tokens=2048, | |
| do_sample=False, | |
| repetition_penalty=1.05, | |
| ) | |
| print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False)) | |
| ``` | |
| ## π― When to use which Fabliq? | |
| | Use case | Model | | |
| | --- | --- | | |
| | Pure terminal / coding agent (read, edit, run, verify) | [Fabliq-8B-Agent](https://huggingface.co/LLM-OS-Models/Fabliq-8B-Agent) | | |
| | Multi-domain expert Q&A + reasoning + still agentic | **Fabliq-8B-Agent-Reasoning** (this model) | | |
| | Local 16GB VRAM deployment with tool-use | Either β both fit comfortably | | |
| ## β οΈ Limitations | |
| - **Phase-2 dataset is small (281 rows).** Reasoning expansion is real but bounded β this is a delta on top of Phase-1, not a from-scratch reasoning model. | |
| - **WithinUs dedup surprise.** Source dataset claims 25k rows but after SHA-256 dedup of templated prompts, only 135 unique rows remain. Template overfitting in the source data was severe. | |
| - **Helio Russian filter.** Original 462 rows filtered to 146 rows after removing Cyrillic-dominant (Russian) traces. Non-English coverage is limited. | |
| - **No safety alignment.** Trained on raw reasoning traces; add your own guardrails for production. | |
| - **Max seq 8,192 at training.** Behavior beyond 8K context is unverified. | |
| - **English-centric.** | |
| ## π License | |
| Apache 2.0, inherited from the [LiquidAI LFM2.5-8B-A1B](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B) base. | |
| ## π³ Model tree | |
| This is a **fine-tune** (continuation SFT). Direct parent: [`LLM-OS-Models/Fabliq-8B-Agent`](https://huggingface.co/LLM-OS-Models/Fabliq-8B-Agent). | |
| ``` | |
| LiquidAI/LFM2.5-8B-A1B (LiquidAI base) | |
| ββ LLM-OS-Models/LFM2.5-8B-A1B-Terminal-ToolBench-Full-SFT-1Epoch (ToolBench foundation) | |
| ββ LLM-OS-Models/Fabliq-8B-Agent (Phase-1: Fable-5 agentic SFT) | |
| ββ LLM-OS-Models/Fabliq-8B-Agent-Reasoning β this model (Phase-2: + WithinUs + Helio) | |
| ``` | |
| ## π Acknowledgements | |
| - **Base:** [LiquidAI/LFM2.5-8B-A1B](https://huggingface.co/LiquidAI/LFM2.5-8B-A1B) | |
| - **Phase-1 parent:** [Fabliq-8B-Agent](https://huggingface.co/LLM-OS-Models/Fabliq-8B-Agent) | |
| - **Phase-2 data:** WithinUs (claude_mythos_distilled_25k), Helio (Fable-5-Distill-Reasoning-462x) | |
| - **Reference:** [empero-ai/Qwythos-9B-Claude-Mythos-5-1M](https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M) β 2-phase curriculum pattern | |