Transformers
GGUF
qwen
qwen3
qwen3.6
reasoning
instruction-tuning
software-engineering
coding
full-finetune
production
fable5
mythos
conversational
Instructions to use JinglanWeb3/QwenFable with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use JinglanWeb3/QwenFable with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("JinglanWeb3/QwenFable", dtype="auto") - llama-cpp-python
How to use JinglanWeb3/QwenFable with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="JinglanWeb3/QwenFable", filename="Qwable-27b_Q4_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use JinglanWeb3/QwenFable with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf JinglanWeb3/QwenFable:Q4_K_M # Run inference directly in the terminal: llama cli -hf JinglanWeb3/QwenFable:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf JinglanWeb3/QwenFable:Q4_K_M # Run inference directly in the terminal: llama cli -hf JinglanWeb3/QwenFable:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf JinglanWeb3/QwenFable:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf JinglanWeb3/QwenFable:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf JinglanWeb3/QwenFable:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf JinglanWeb3/QwenFable:Q4_K_M
Use Docker
docker model run hf.co/JinglanWeb3/QwenFable:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use JinglanWeb3/QwenFable with Ollama:
ollama run hf.co/JinglanWeb3/QwenFable:Q4_K_M
- Unsloth Studio
How to use JinglanWeb3/QwenFable with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for JinglanWeb3/QwenFable to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for JinglanWeb3/QwenFable to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for JinglanWeb3/QwenFable to start chatting
- Pi
How to use JinglanWeb3/QwenFable with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf JinglanWeb3/QwenFable:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "JinglanWeb3/QwenFable:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use JinglanWeb3/QwenFable with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf JinglanWeb3/QwenFable:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default JinglanWeb3/QwenFable:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- OpenClaw new
How to use JinglanWeb3/QwenFable with OpenClaw:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf JinglanWeb3/QwenFable:Q4_K_M
Configure OpenClaw
# Install OpenClaw: npm install -g openclaw@latest # Register the local server and set it as the default model: openclaw onboard --non-interactive --mode local \ --auth-choice custom-api-key \ --custom-base-url http://127.0.0.1:8080/v1 \ --custom-model-id "JinglanWeb3/QwenFable:Q4_K_M" \ --custom-provider-id llama-cpp \ --custom-compatibility openai \ --custom-text-input \ --accept-risk \ --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
- Docker Model Runner
How to use JinglanWeb3/QwenFable with Docker Model Runner:
docker model run hf.co/JinglanWeb3/QwenFable:Q4_K_M
- Lemonade
How to use JinglanWeb3/QwenFable with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull JinglanWeb3/QwenFable:Q4_K_M
Run and chat with the model
lemonade run user.QwenFable-Q4_K_M
List all available models
lemonade list
| base_model: unsloth/Qwen3.6-27B | |
| library_name: transformers | |
| license: mit | |
| tags: | |
| - qwen | |
| - qwen3 | |
| - qwen3.6 | |
| - reasoning | |
| - instruction-tuning | |
| - software-engineering | |
| - coding | |
| - full-finetune | |
| - transformers | |
| - production | |
| - fable5 | |
| - mythos | |
| datasets: | |
| - WithinUsAI/claude_mythos_distilled_25k | |
| - 11-47/cluade_mythos_preview_5k_v2 | |
| - 11-47/claude_opus_mythos_5k | |
| - >- | |
| Johnblick187/claude-sonnet-4.6-opus-4.8-mythos-5-fable-5-openai-finetuning-dataset | |
| - juiceb0xc0de/Qwythos-9B-Claude-Mythos-5-1M-atlas | |
| - thetrillioniar/Mythos-5-and-Fabel-5-Class-Model-Outputs | |
| - 11-47/claude_mythos_distill_5k | |
| - ox-ox/mythos-character-distillation | |
| - Glint-Research/Fable-5-traces | |
| - armand0e/claude-fable-5-claude-code | |
| - lordx64/agentic-distill-fable-5-sft | |
| - Crownelius/Complete-FABLE.5-traces-2M | |
| - victor/fable-5-boeing-747-trace | |
| - HelioAI/Fable-5-Distill-Reasoning-462x | |
| - cfahlgren1/Fable-5-traces | |
| - attentionAllYouNeed/Vibe-Coding-Claude-Fable-5 | |
| - kelexine/fable-5-sft-traces | |
| language: | |
| - en | |
| - zh | |
| - ko | |
| - hi | |
| - sa | |
| - ta | |
| - te | |
| - fr | |
| - es | |
| - mr | |
| - gd | |
| - br | |
| <p align="center"> | |
| <img src="assets/qwable-27b.png" alt="Qwable 27B" width="760"> | |
| </p> | |
| <h1 align="center">Qwable 27B</h1> | |
| <p align="center"> | |
| A production-grade, fully fine-tuned 27B language model engineered for advanced reasoning, software engineering, structured problem solving, and high-quality instruction following. | |
| </p> | |
| --- | |
| # Overview | |
| **Qwable 27B** is a production-ready language model built upon **unsloth/Qwen3.6-27B** through full supervised fine-tuning. | |
| Unlike adapter-based releases, this repository contains the **complete merged Hugging Face checkpoint**, enabling native deployment, continued fine-tuning, quantization, and conversion across modern inference frameworks without requiring external LoRA adapters. | |
| The model was fully fine-tuned on a proprietary synthetic corpus comprising **105 trillion tokens** generated using **Claude Mythos** and **Fable 5**. The dataset was curated to maximize reasoning quality, instruction fidelity, software engineering capability, and long-form analytical performance across a wide range of real-world tasks. | |
| Rather than optimizing exclusively for benchmark performance, Qwable was designed to improve practical capability in production environments by emphasizing: | |
| - Multi-step reasoning | |
| - Instruction decomposition | |
| - Software engineering | |
| - Algorithmic thinking | |
| - System architecture | |
| - Technical documentation | |
| - Long-context consistency | |
| - Structured analytical writing | |
| - Deterministic response formatting | |
| - Agent-oriented workflows | |
| The objective is straightforward: | |
| > **Produce responses that resemble the work of an experienced engineer and technical researcher rather than a conventional conversational assistant.** | |
| --- | |
| # Highlights | |
| - **Base Model:** `unsloth/Qwen3.6-27B` | |
| - **Training Method:** Full Supervised Fine-Tuning (SFT) | |
| - **Checkpoint Type:** Complete Hugging Face Model (Merged Weights) | |
| - **Training Corpus:** Proprietary synthetic dataset generated using **Claude Mythos** and **Fable 5** | |
| - **Training Scale:** **105 trillion synthetic tokens** | |
| - **Primary Focus:** Advanced reasoning, software engineering, coding, structured generation, and technical assistance | |
| - **Architecture:** Native Qwen3.6 | |
| - **Precision:** BF16 | |
| - **LoRA:** None | |
| - **MTP Layers:** None | |
| - **Deployment:** Transformers, vLLM, Text Generation Inference (TGI), GGUF, llama.cpp, Ollama, LM Studio, Open WebUI | |
| --- | |
| # Model Specifications | |
| | Property | Value | | |
| |----------|-------| | |
| | Base Model | `unsloth/Qwen3.6-27B` | | |
| | Model Family | Qwen 3.6 | | |
| | Parameters | 27 Billion | | |
| | Architecture | Native Qwen3.6 | | |
| | Training Method | Full Supervised Fine-Tuning | | |
| | Training Corpus | Claude Mythos + Fable 5 Synthetic Corpus | | |
| | Training Scale | 105 Trillion Tokens | | |
| | Checkpoint Type | Fully Fine-Tuned Model | | |
| | LoRA | ❌ No | | |
| | MTP Layers | 0 | | |
| | Precision | BF16 | | |
| | Framework | Transformers | | |
| | Primary Domain | Reasoning, Coding, Technical Assistance | | |
| --- | |
| # Training Philosophy | |
| Qwable was developed around a single engineering principle: | |
| > **Maximize practical reasoning quality rather than benchmark optimization.** | |
| Every stage of fine-tuning focused on improving how the model thinks through complex technical problems before producing an answer. | |
| Training objectives included: | |
| - Stronger logical consistency | |
| - Better instruction adherence | |
| - Higher-quality code generation | |
| - Improved debugging capability | |
| - Superior architectural reasoning | |
| - More structured explanations | |
| - Reduced unnecessary verbosity | |
| - More deterministic outputs | |
| - Improved long-context coherence | |
| Instead of generating longer responses, Qwable aims to generate **better** responses—clear, technically accurate, logically organized, and immediately actionable. | |
| --- | |
| # Why Full Fine-Tuning? | |
| Qwable is distributed as a **fully fine-tuned model**, not an adapter. | |
| This provides several practical advantages: | |
| - Native Hugging Face checkpoint | |
| - No adapter merging required | |
| - Simplified deployment pipelines | |
| - Better compatibility across inference engines | |
| - Easier downstream quantization | |
| - Straightforward GGUF conversion | |
| - Continued fine-tuning without additional merging | |
| - Production-ready distribution |