Instructions to use lordx64/Qwable-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lordx64/Qwable-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="lordx64/Qwable-v2")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("lordx64/Qwable-v2")
model = AutoModelForMultimodalLM.from_pretrained("lordx64/Qwable-v2")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use lordx64/Qwable-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lordx64/Qwable-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lordx64/Qwable-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/lordx64/Qwable-v2

SGLang

How to use lordx64/Qwable-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "lordx64/Qwable-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lordx64/Qwable-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "lordx64/Qwable-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lordx64/Qwable-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use lordx64/Qwable-v2 with Docker Model Runner:
```
docker model run hf.co/lordx64/Qwable-v2
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Qwable-v2

Qwen + Fable, second iteration · An open-weights agentic coding model. 35B Mixture-of-Experts (3B active), built by layering Claude Fable-5 agentic tool-use behavior on top of a Claude Opus 4.7 reasoning distill of Qwen3.6-35B-A3B. Trained with 4× the LoRA capacity and 2× the SFT data of Qwable-v1.

TL;DR

Qwable-v2 is the second iteration of the Qwable lineage — same chained-distill structure (vanilla Qwen3.6 → Opus 4.7 reasoning → Fable-5 agentic), but with a beefier training recipe targeting the dominant v1 failure mode (early-stop-mid-tool_use, ~48% of completed SWE-bench Lite runs in v1).

Thinks in explicit <think>…</think> chains-of-thought (inherited from the Opus 4.7 prior)
Acts like a Claude-Code-style agent when prompted as one — emits <tool_use> XML for file edits, shell, reads. v2 uses real Claude Code tool names (Read, Edit, Bash) with correct field signatures (file_path, old_string, new_string) where v1 was inventing variants (read_file, Replace, regex/replace).
The XML format is still system-prompt-conditional: it appears reliably with an agent-style system prompt or a preceding <tool_result> turn. Bare prompts fall back to text-style explanations.
Runs on a single H200 / 2× A100-80GB at bf16, or any 24+ GB consumer GPU at IQ4_XS quantization.

What changed vs v1

Single-purpose comparison — same lineage, same base, same training infrastructure. Only these knobs moved:

Setting	v1 (shipped 2026-06-15)	v2 (shipped 2026-06-27)	Why
SFT dataset	`agentic-distill-fable-5-sft` (4,659 rows)	`fable-sft-combined-v2` (9,842 rows) = v1 ∪ `fable-tool-use-sft` (5,183 rows), 0% SHA-256 overlap on user content	2× the unique training rows; mixes with-`<think>` (46%) and without-`<think>` (54%) for conditional reasoning
LoRA rank	16	64 (4× v1)	Target the closing-tag pattern that v1 LoRA capacity couldn't fully learn — 145/300 SWE-bench Lite empties were unclosed `<tool_use>` blocks
Sequence length	4,096	8,192 (2× v1)	Full agentic conversations fit; closing tags can't get clipped by seq budget
Epochs	2	3	More reps over the larger corpus
Learning rate	2e-5	1.5e-5	Gentler updates over the longer run; protects the Opus 4.7 reasoning prior from catastrophic forgetting
Target modules	attention-only (`q/k/v/o`)	same	Adding MLP touches MoE expert weights, risks re-triggering the unsloth-zoo shape bug we fixed for v1
Effective batch	16 (1 × grad-accum 16)	same	Stable from v1
Base (warm-start)	`lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled`	same	Same A/B is cleaner
Wall-clock	14.1h on Seoul H200	37.5h on us-east-2 H200 (1 first attempt lost to an HF payment hold mid-run; 2nd run completed clean)	r=64 + seq=8192 + 2× steps = ~5× the per-run compute
Cost	~$70	~$190 for the successful 2nd run (+$200 for the lost first run = $390 total v2 spend)	Same $5/hr H200 rate
Final loss	0.7956 (last-20 avg)	TBD (run completed cleanly; computing final-window avg)	—

Spot-check (post-training, 2026-06-27)

Three identical probe variants against v1 and v2 (port-change task, real Fable-5-style prompt):

Variant	v1 output	v2 output
Bare prompt, generic system	Markdown bash/python explanation (Opus prior dominates)	`<tool_code>...</tool_code>` Gemini-style block (still no `<tool_use>` XML, but at least a tool-call-shaped envelope)
Agent system prompt	✅ `<tool_use name="read_file" id="toolu_01abc">` — invented tool name, made-up id	✅ `<tool_use name="Read" id="toolu_01V21JqR711Vf45JL6o2Y351">` — real Claude Code tool name, real Anthropic-format id
Multi-turn with prior `<tool_result>`	✅ `<tool_use name="Replace">{"regex":..., "replace":...}` — invented Replace tool with synthesized fields	✅ `<tool_use name="Edit">{"file_path":..., "old_string":..., "new_string":...}` — real Claude Code Edit signature with correct field names

The structural improvement is real and measurable. v2 has internalized the actual Claude Code tool surface (names + field schemas), where v1 was inventing plausible-sounding variants. The system-prompt-conditionality wasn't fully fixed — bare prompts still fall back — but the fallback is now a different XML envelope (<tool_code>) rather than plain markdown.

Definitive proof of the early-stop-mid-tool_use fix (the dominant v1 ceiling on SWE-bench Lite) still needs the full SWE-bench Lite re-run — that's pending and the number will land in the Evaluation section when it does.

Honest scope

Same fundamentals as v1, just stronger on the things that already worked:

Qwen3.6-35B-A3B (vanilla, Apache 2.0)
  └─SFT─▶ Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled
           └─SFT─▶ Qwable-v1
                     ─── (same warm-start, not chained)
           └─SFT─▶ Qwable-v2  ← you are here

The new SFT corpus is the union of v1's data + a tool-use-only re-pack of Glint-Research/Complete-FABLE.5-traces-2M (=our fable-tool-use-sft). Both upstream sources are signature-verified real Fable-5 captures, with the SFT-side text re-rendered for Qwen chat template + tool envelope serialization.

For pure reasoning (math, science, general Q&A): omit the agent system prompt. The Opus 4.7 distill underneath is what's doing the work. v2 should match v1 here (the lower LR was designed to protect the prior from forgetting). For agentic coding (edit-this-file, run-this-test, scroll-this-codebase): supply an agent system prompt naming <tool_use> XML. v2 produces structurally cleaner tool calls than v1 (real Claude Code names + fields). For chat / general assistant: works; persona may drift toward Claude voice (double Anthropic SFT stacking, same as v1).

Training recipe (concrete reference)


Base (warm-start)	`lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled`
SFT dataset	`lordx64/fable-sft-combined-v2` — 9,842 unique rows, ~24.5M Qwen tokens
Library	Unsloth `FastLanguageModel` + TRL `SFTTrainer`
LoRA	r=64, alpha=64, attention-only (`q_proj, k_proj, v_proj, o_proj`), dropout 0.0
Loss masking	`train_on_responses_only` (gradients only flow through assistant turns, including `<think>` block)
Sequence length	8,192
Epochs	3
Effective batch size	16 (per-device 1 × grad-accum 16)
Optimizer	AdamW 8-bit, cosine LR, 3% warmup, weight decay 0.01
Learning rate	1.5e-5
Precision	bf16 forward + LoRA params
Random seed	3407
Hardware	1× nvidia-h200 x1 (141 GB) on AWS us-east-2 via HF Inference Endpoints
Total optimizer steps	1,839 (9,842 rows × 3 epochs ÷ effective batch 16; small drop from prep for label-all-masked rows)
Wall-clock	37.5h (one prior 40h attempt lost to a billing hold ~98% through training)
Cost	~$190 (successful run only) + ~$200 (lost attempt) = ~$390 total v2 spend
Final save	`merged_16bit` via Unsloth

The training script is training/train.py in the source repo; the submitter is training/endpoint/deploy_fable.py --v2.

Evaluation

🚧 Evals are in progress. This table will fill in as each suite completes; nothing here is published until verified.

Benchmark	Setup	v1 score	v2 score	Status
GSM8K-CoT	8-shot multi-turn, limit 300	pending	pending	🚧 in progress
MMLU-Pro	5-shot multi-turn, limit 500	pending	pending	🚧 in progress
GPQA Diamond	0-shot CoT	pending	pending	🚧 in progress
MATH-500	0-shot, `math_verify` metric	pending	pending	🚧 in progress
AIME 2024 / 2025	0-shot CoT	pending	pending	🚧 in progress
HumanEval / MBPP	pass@1 / pass@10	pending	pending	🚧 in progress
IFEval	0-shot	pending	pending	🚧 in progress
SWE-bench Lite (hand-rolled harness, non-empty patches)	300 instances, no Docker test execution	109/254 = 42.9% of valid runs	pending	🚧 v2 re-run pending
SWE-bench Lite — Resolved %	Official Docker eval on generated patches	pending	pending	🚧 Docker harness setup pending

Standing rule on this project: numbers stay blank until verified. If a benchmark hits a known extraction bug, we omit it rather than publish a misleading score.

Usage

Transformers (full bf16, ~70 GB)

Important: Qwable-v2 emits <tool_use> XML reliably only when prompted as an agent. Same recipe as v1 — use a system prompt that explicitly requests the XML format:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tok = AutoTokenizer.from_pretrained("lordx64/Qwable-v2")
model = AutoModelForCausalLM.from_pretrained(
    "lordx64/Qwable-v2",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

SYSTEM = (
    "You are a coding agent. When you need to read, write, edit, or run code, "
    "emit XML tool calls in this exact format:\n"
    '<tool_use name="X" id="toolu_01abc">\n{"...": "..."}\n</tool_use>\n'
    "Do NOT respond with markdown code blocks. Always use <tool_use> XML."
)
messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": "Read /tmp/server.py and tell me what port it listens on."},
]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True,
                                  return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=2048, temperature=0.6, top_p=0.9)
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=False))

For pure reasoning use (math, science, general Q&A), omit the system prompt or use the generic "You are a helpful AI assistant." — the model will produce reasoning + a text answer like the underlying Opus 4.7 distill.

vLLM serving

vllm serve lordx64/Qwable-v2 \
    --max-model-len 16384 \
    --tensor-parallel-size 2 \
    --trust-remote-code

llama.cpp / LM Studio (GGUF)

# Quants build pending; same flow as v1's lordx64/Qwable-v1-GGUF (IQ4_XS / Q4_K_M / Q5_K_M / Q8_0)
llama-cli -m Qwable-v2-IQ4_XS.gguf -p "..."

Tool-use format

Same as v1 — custom <tool_use> XML envelope. v2's key improvement is fidelity to the real Claude Code tool inventory:

<think>
The user wants to change the port. I should Read the file first, then Edit it.
</think>

<tool_use name="Read" id="toolu_01ABC...">
{
  "file_path": "/tmp/server.py"
}
</tool_use>

Then on the next turn the user supplies a <tool_result>:

<tool_result id="toolu_01ABC..." is_error="false">
from flask import Flask
...
app.run(port=8000)
</tool_result>

And v2 produces:

<tool_use name="Edit" id="toolu_01XYZ...">
{
  "file_path": "/tmp/server.py",
  "old_string": "    app.run(port=8000)",
  "new_string": "    app.run(port=8080)"
}
</tool_use>

Versus v1 which invented <tool_use name="Replace">{"regex": ..., "replace": ...}. The v2 names + field signatures match Anthropic's published Claude Code tool definitions.

Limitations

Tool-use format is still system-prompt-conditional. Improved over v1 but not fully fixed. Bare prompts produce a <tool_code> Gemini-style envelope instead of <tool_use> XML; agent system prompts produce the right format reliably. Same operational guidance as v1: run inside a harness that supplies a tool-use system prompt + tool registry.
Narrow training distribution — even with the 2× corpus, both source datasets ultimately derive from the same hundreds of Glint-captured Claude Code sessions. Out-of-distribution agent tasks (DevOps, data science, security workflows) may still be hit-or-miss.
Custom tool envelope. <tool_use> XML doesn't slot into vLLM's tool-calling API automatically; need a parser wrapper.
Persona drift — two SFT rounds against Anthropic-style outputs, now reinforced over more steps. May produce a model that occasionally self-identifies as Claude in chat.
Reasoning is still from Opus 4.7, not Fable-5. Don't expect v2 to outperform the underlying Opus 4.7 distill on pure-reasoning benchmarks. The lower LR (1.5e-5 vs v1's 2e-5) was deliberately chosen to protect the reasoning prior; verifying that worked is pending the lm-eval suite.
No formal evals at v2 ship time. Same standing rule as v1 — pending.
Two paid runs to ship. First v2 attempt completed ~98% of training before an HF billing hold paused the endpoint; the second clean run cost ~$190 on top of the ~$200 lost first attempt. v3 will add checkpoint+resume to mitigate this.

License & terms

Inherits AGPL-3.0 from the upstream Glint-Research/Fable-5-traces + Glint-Research/Complete-FABLE.5-traces-2M datasets. Downstream users running Qwable-v2 in a network-accessible service must comply with AGPL §13 (source disclosure for network use).

The underlying Fable-5 thinking traces are derivative content from Anthropic's claude-fable-5 preview model (suspended globally 2026-06-22 under U.S. export-control directives). Downstream users should verify compliance with Anthropic's usage policies for their specific use case.

The Qwen3.6-35B-A3B base is Apache 2.0; the Opus 4.7 distill (intermediate base) is Apache 2.0. Qwable-v2's AGPL designation supersedes those due to the Fable-5 data's AGPL upstream.

Citation

@misc{lordx64_qwable_v2_2026,
  title  = {Qwable-v2: Agentic coding distillation from Claude Fable-5 onto Qwen3.6-35B-A3B with LoRA r=64 + combined corpus},
  author = {lordx64},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/lordx64/Qwable-v2}},
}

Acknowledgements

Same lineage as v1, with the added v2 contributions:

Glint-Research for both the original Fable-5-traces (v1's source) and Complete-FABLE.5-traces-2M (the upstream for fable-tool-use-sft, v2's added corpus).
1EYE4ALL and Crownelius for the intermediate hosts in the Complete-FABLE.5-traces-2M provenance chain.
TeichAI for the upstream collection that all of this ultimately traces to.
Anthropic for the Claude Fable-5 preview model (briefly available 2026-06-10 to 2026-06-22) and the prior Opus 4.7 / Opus 4.6 work this lineage is built on.
Qwen team for releasing Qwen3.6-35B-A3B under Apache 2.0.
Unsloth for 2× faster LoRA training and the MoE+LoRA shape fix in unsloth-zoo PR #601.
HuggingFace for the Inference Endpoint H200 fleet where the training actually ran.

Downloads last month: -

Safetensors

Model size

36B params

Tensor type

BF16

Model tree for lordx64/Qwable-v2

Base model

Qwen/Qwen3.6-35B-A3B

Adapter

lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled

Finetuned

(5)

this model

Quantizations

3 models

lordx64
/

Qwable-v2