Instructions to use lordx64/Qwable-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lordx64/Qwable-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="lordx64/Qwable-v2") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("lordx64/Qwable-v2") model = AutoModelForMultimodalLM.from_pretrained("lordx64/Qwable-v2") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use lordx64/Qwable-v2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "lordx64/Qwable-v2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lordx64/Qwable-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/lordx64/Qwable-v2
- SGLang
How to use lordx64/Qwable-v2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "lordx64/Qwable-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lordx64/Qwable-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "lordx64/Qwable-v2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lordx64/Qwable-v2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use lordx64/Qwable-v2 with Docker Model Runner:
docker model run hf.co/lordx64/Qwable-v2
Qwable-v2
Qwen + Fable, second iteration · An open-weights agentic coding model. 35B Mixture-of-Experts (3B active), built by layering Claude Fable-5 agentic tool-use behavior on top of a Claude Opus 4.7 reasoning distill of Qwen3.6-35B-A3B. Trained with 4× the LoRA capacity and 2× the SFT data of Qwable-v1.
TL;DR
Qwable-v2 is the second iteration of the Qwable lineage — same chained-distill structure (vanilla Qwen3.6 → Opus 4.7 reasoning → Fable-5 agentic), but with a beefier training recipe targeting the dominant v1 failure mode (early-stop-mid-tool_use, ~48% of completed SWE-bench Lite runs in v1).
- Thinks in explicit
<think>…</think>chains-of-thought (inherited from the Opus 4.7 prior) - Acts like a Claude-Code-style agent when prompted as one — emits
<tool_use>XML for file edits, shell, reads. v2 uses real Claude Code tool names (Read,Edit,Bash) with correct field signatures (file_path,old_string,new_string) where v1 was inventing variants (read_file,Replace,regex/replace). - The XML format is still system-prompt-conditional: it appears reliably with an agent-style system prompt or a preceding
<tool_result>turn. Bare prompts fall back to text-style explanations. - Runs on a single H200 / 2× A100-80GB at bf16, or any 24+ GB consumer GPU at IQ4_XS quantization.
What changed vs v1
Single-purpose comparison — same lineage, same base, same training infrastructure. Only these knobs moved:
| Setting | v1 (shipped 2026-06-15) | v2 (shipped 2026-06-27) | Why |
|---|---|---|---|
| SFT dataset | agentic-distill-fable-5-sft (4,659 rows) |
fable-sft-combined-v2 (9,842 rows) = v1 ∪ fable-tool-use-sft (5,183 rows), 0% SHA-256 overlap on user content |
2× the unique training rows; mixes with-<think> (46%) and without-<think> (54%) for conditional reasoning |
| LoRA rank | 16 | 64 (4× v1) | Target the closing-tag pattern that v1 LoRA capacity couldn't fully learn — 145/300 SWE-bench Lite empties were unclosed <tool_use> blocks |
| Sequence length | 4,096 | 8,192 (2× v1) | Full agentic conversations fit; closing tags can't get clipped by seq budget |
| Epochs | 2 | 3 | More reps over the larger corpus |
| Learning rate | 2e-5 | 1.5e-5 | Gentler updates over the longer run; protects the Opus 4.7 reasoning prior from catastrophic forgetting |
| Target modules | attention-only (q/k/v/o) |
same | Adding MLP touches MoE expert weights, risks re-triggering the unsloth-zoo shape bug we fixed for v1 |
| Effective batch | 16 (1 × grad-accum 16) | same | Stable from v1 |
| Base (warm-start) | lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled |
same | Same A/B is cleaner |
| Wall-clock | 14.1h on Seoul H200 | 37.5h on us-east-2 H200 (1 first attempt lost to an HF payment hold mid-run; 2nd run completed clean) | r=64 + seq=8192 + 2× steps = ~5× the per-run compute |
| Cost | ~$70 | ~$190 for the successful 2nd run (+$200 for the lost first run = $390 total v2 spend) | Same $5/hr H200 rate |
| Final loss | 0.7956 (last-20 avg) | TBD (run completed cleanly; computing final-window avg) | — |
Spot-check (post-training, 2026-06-27)
Three identical probe variants against v1 and v2 (port-change task, real Fable-5-style prompt):
| Variant | v1 output | v2 output |
|---|---|---|
| Bare prompt, generic system | Markdown bash/python explanation (Opus prior dominates) | <tool_code>...</tool_code> Gemini-style block (still no <tool_use> XML, but at least a tool-call-shaped envelope) |
| Agent system prompt | ✅ <tool_use name="read_file" id="toolu_01abc"> — invented tool name, made-up id |
✅ <tool_use name="Read" id="toolu_01V21JqR711Vf45JL6o2Y351"> — real Claude Code tool name, real Anthropic-format id |
Multi-turn with prior <tool_result> |
✅ <tool_use name="Replace">{"regex":..., "replace":...} — invented Replace tool with synthesized fields |
✅ <tool_use name="Edit">{"file_path":..., "old_string":..., "new_string":...} — real Claude Code Edit signature with correct field names |
The structural improvement is real and measurable. v2 has internalized the actual Claude Code tool surface (names + field schemas), where v1 was inventing plausible-sounding variants. The system-prompt-conditionality wasn't fully fixed — bare prompts still fall back — but the fallback is now a different XML envelope (<tool_code>) rather than plain markdown.
Definitive proof of the early-stop-mid-tool_use fix (the dominant v1 ceiling on SWE-bench Lite) still needs the full SWE-bench Lite re-run — that's pending and the number will land in the Evaluation section when it does.
Honest scope
Same fundamentals as v1, just stronger on the things that already worked:
Qwen3.6-35B-A3B (vanilla, Apache 2.0)
└─SFT─▶ Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled
└─SFT─▶ Qwable-v1
─── (same warm-start, not chained)
└─SFT─▶ Qwable-v2 ← you are here
The new SFT corpus is the union of v1's data + a tool-use-only re-pack of Glint-Research/Complete-FABLE.5-traces-2M (=our fable-tool-use-sft). Both upstream sources are signature-verified real Fable-5 captures, with the SFT-side text re-rendered for Qwen chat template + tool envelope serialization.
For pure reasoning (math, science, general Q&A): omit the agent system prompt. The Opus 4.7 distill underneath is what's doing the work. v2 should match v1 here (the lower LR was designed to protect the prior from forgetting).
For agentic coding (edit-this-file, run-this-test, scroll-this-codebase): supply an agent system prompt naming <tool_use> XML. v2 produces structurally cleaner tool calls than v1 (real Claude Code names + fields).
For chat / general assistant: works; persona may drift toward Claude voice (double Anthropic SFT stacking, same as v1).
Training recipe (concrete reference)
| Base (warm-start) | lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled |
| SFT dataset | lordx64/fable-sft-combined-v2 — 9,842 unique rows, ~24.5M Qwen tokens |
| Library | Unsloth FastLanguageModel + TRL SFTTrainer |
| LoRA | r=64, alpha=64, attention-only (q_proj, k_proj, v_proj, o_proj), dropout 0.0 |
| Loss masking | train_on_responses_only (gradients only flow through assistant turns, including <think> block) |
| Sequence length | 8,192 |
| Epochs | 3 |
| Effective batch size | 16 (per-device 1 × grad-accum 16) |
| Optimizer | AdamW 8-bit, cosine LR, 3% warmup, weight decay 0.01 |
| Learning rate | 1.5e-5 |
| Precision | bf16 forward + LoRA params |
| Random seed | 3407 |
| Hardware | 1× nvidia-h200 x1 (141 GB) on AWS us-east-2 via HF Inference Endpoints |
| Total optimizer steps | 1,839 (9,842 rows × 3 epochs ÷ effective batch 16; small drop from prep for label-all-masked rows) |
| Wall-clock | 37.5h (one prior 40h attempt lost to a billing hold ~98% through training) |
| Cost | ~$190 (successful run only) + ~$200 (lost attempt) = ~$390 total v2 spend |
| Final save | merged_16bit via Unsloth |
The training script is training/train.py in the source repo; the submitter is training/endpoint/deploy_fable.py --v2.
Evaluation
🚧 Evals are in progress. This table will fill in as each suite completes; nothing here is published until verified.
| Benchmark | Setup | v1 score | v2 score | Status |
|---|---|---|---|---|
| GSM8K-CoT | 8-shot multi-turn, limit 300 | pending | pending | 🚧 in progress |
| MMLU-Pro | 5-shot multi-turn, limit 500 | pending | pending | 🚧 in progress |
| GPQA Diamond | 0-shot CoT | pending | pending | 🚧 in progress |
| MATH-500 | 0-shot, math_verify metric |
pending | pending | 🚧 in progress |
| AIME 2024 / 2025 | 0-shot CoT | pending | pending | 🚧 in progress |
| HumanEval / MBPP | pass@1 / pass@10 | pending | pending | 🚧 in progress |
| IFEval | 0-shot | pending | pending | 🚧 in progress |
| SWE-bench Lite (hand-rolled harness, non-empty patches) | 300 instances, no Docker test execution | 109/254 = 42.9% of valid runs | pending | 🚧 v2 re-run pending |
| SWE-bench Lite — Resolved % | Official Docker eval on generated patches | pending | pending | 🚧 Docker harness setup pending |
Standing rule on this project: numbers stay blank until verified. If a benchmark hits a known extraction bug, we omit it rather than publish a misleading score.
Usage
Transformers (full bf16, ~70 GB)
Important: Qwable-v2 emits <tool_use> XML reliably only when prompted as an agent. Same recipe as v1 — use a system prompt that explicitly requests the XML format:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tok = AutoTokenizer.from_pretrained("lordx64/Qwable-v2")
model = AutoModelForCausalLM.from_pretrained(
"lordx64/Qwable-v2",
torch_dtype=torch.bfloat16,
device_map="auto",
)
SYSTEM = (
"You are a coding agent. When you need to read, write, edit, or run code, "
"emit XML tool calls in this exact format:\n"
'<tool_use name="X" id="toolu_01abc">\n{"...": "..."}\n</tool_use>\n'
"Do NOT respond with markdown code blocks. Always use <tool_use> XML."
)
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": "Read /tmp/server.py and tell me what port it listens on."},
]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True,
return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=2048, temperature=0.6, top_p=0.9)
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=False))
For pure reasoning use (math, science, general Q&A), omit the system prompt or use the generic "You are a helpful AI assistant." — the model will produce reasoning + a text answer like the underlying Opus 4.7 distill.
vLLM serving
vllm serve lordx64/Qwable-v2 \
--max-model-len 16384 \
--tensor-parallel-size 2 \
--trust-remote-code
llama.cpp / LM Studio (GGUF)
# Quants build pending; same flow as v1's lordx64/Qwable-v1-GGUF (IQ4_XS / Q4_K_M / Q5_K_M / Q8_0)
llama-cli -m Qwable-v2-IQ4_XS.gguf -p "..."
Tool-use format
Same as v1 — custom <tool_use> XML envelope. v2's key improvement is fidelity to the real Claude Code tool inventory:
<think>
The user wants to change the port. I should Read the file first, then Edit it.
</think>
<tool_use name="Read" id="toolu_01ABC...">
{
"file_path": "/tmp/server.py"
}
</tool_use>
Then on the next turn the user supplies a <tool_result>:
<tool_result id="toolu_01ABC..." is_error="false">
from flask import Flask
...
app.run(port=8000)
</tool_result>
And v2 produces:
<tool_use name="Edit" id="toolu_01XYZ...">
{
"file_path": "/tmp/server.py",
"old_string": " app.run(port=8000)",
"new_string": " app.run(port=8080)"
}
</tool_use>
Versus v1 which invented <tool_use name="Replace">{"regex": ..., "replace": ...}. The v2 names + field signatures match Anthropic's published Claude Code tool definitions.
Limitations
- Tool-use format is still system-prompt-conditional. Improved over v1 but not fully fixed. Bare prompts produce a
<tool_code>Gemini-style envelope instead of<tool_use>XML; agent system prompts produce the right format reliably. Same operational guidance as v1: run inside a harness that supplies a tool-use system prompt + tool registry. - Narrow training distribution — even with the 2× corpus, both source datasets ultimately derive from the same hundreds of Glint-captured Claude Code sessions. Out-of-distribution agent tasks (DevOps, data science, security workflows) may still be hit-or-miss.
- Custom tool envelope.
<tool_use>XML doesn't slot into vLLM's tool-calling API automatically; need a parser wrapper. - Persona drift — two SFT rounds against Anthropic-style outputs, now reinforced over more steps. May produce a model that occasionally self-identifies as Claude in chat.
- Reasoning is still from Opus 4.7, not Fable-5. Don't expect v2 to outperform the underlying Opus 4.7 distill on pure-reasoning benchmarks. The lower LR (1.5e-5 vs v1's 2e-5) was deliberately chosen to protect the reasoning prior; verifying that worked is pending the lm-eval suite.
- No formal evals at v2 ship time. Same standing rule as v1 — pending.
- Two paid runs to ship. First v2 attempt completed ~98% of training before an HF billing hold paused the endpoint; the second clean run cost ~$190 on top of the ~$200 lost first attempt. v3 will add checkpoint+resume to mitigate this.
License & terms
Inherits AGPL-3.0 from the upstream Glint-Research/Fable-5-traces + Glint-Research/Complete-FABLE.5-traces-2M datasets. Downstream users running Qwable-v2 in a network-accessible service must comply with AGPL §13 (source disclosure for network use).
The underlying Fable-5 thinking traces are derivative content from Anthropic's claude-fable-5 preview model (suspended globally 2026-06-22 under U.S. export-control directives). Downstream users should verify compliance with Anthropic's usage policies for their specific use case.
The Qwen3.6-35B-A3B base is Apache 2.0; the Opus 4.7 distill (intermediate base) is Apache 2.0. Qwable-v2's AGPL designation supersedes those due to the Fable-5 data's AGPL upstream.
Citation
@misc{lordx64_qwable_v2_2026,
title = {Qwable-v2: Agentic coding distillation from Claude Fable-5 onto Qwen3.6-35B-A3B with LoRA r=64 + combined corpus},
author = {lordx64},
year = {2026},
howpublished = {\url{https://huggingface.co/lordx64/Qwable-v2}},
}
Acknowledgements
Same lineage as v1, with the added v2 contributions:
Glint-Researchfor both the originalFable-5-traces(v1's source) andComplete-FABLE.5-traces-2M(the upstream forfable-tool-use-sft, v2's added corpus).1EYE4ALLandCrowneliusfor the intermediate hosts in theComplete-FABLE.5-traces-2Mprovenance chain.TeichAIfor the upstream collection that all of this ultimately traces to.- Anthropic for the Claude Fable-5 preview model (briefly available 2026-06-10 to 2026-06-22) and the prior Opus 4.7 / Opus 4.6 work this lineage is built on.
- Qwen team for releasing Qwen3.6-35B-A3B under Apache 2.0.
- Unsloth for 2× faster LoRA training and the MoE+LoRA shape fix in unsloth-zoo PR #601.
- HuggingFace for the Inference Endpoint H200 fleet where the training actually ran.
- Downloads last month
- -
Model tree for lordx64/Qwable-v2
Base model
Qwen/Qwen3.6-35B-A3B