How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="sterlixlol/kazi",
	filename="kazi-final-q8_0.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Kazi — Qwen3.6-27B Agentic SWE Fine-Tune (Q8_0 GGUF)

This is Kazi v1 — a QLoRA fine-tune of Qwen3.6-27B-Heretic2-Uncensored-Finetune-Thinking, distilled on 19,000 judge-filtered multi-turn agentic SWE traces from sterlixlol/kazi-agentic-traces-19k.

What it does differently

Measured shifts (50-prompt eval vs base)

Metric Base Kazi (final) Δ
Eval loss 0.4762 0.4359 -8.5%
Text-only responses 22 / 50 (44%) 5 / 50 (10%) -77%
Avg completion tokens per turn 569 131 -77%

Behavioral upgrades distilled from the dataset

The training corpus is 19K judge-filtered traces from frontier models doing real SWE work. Anything that survived the judge had to be: (1) correct, (2) terse, (3) tool-grounded, (4) coherent across turns. The LoRA pulls the base model toward that joint distribution. In practice:

  • Calls the tool instead of describing it — biggest single shift (the 77% drop in text-only replies)
  • Reads before writing — looks at files / runs grep before producing edits, instead of guessing at API shapes
  • Diff/patch-shaped edits — produces minimal scoped changes rather than full-file rewrites
  • Multi-turn plan coherence — keeps plan state across 5–15 turn conversations without drifting
  • Recovers from tool errors — when a command fails, debugs the actual error rather than retrying blind
  • Convention-matching code — picks up existing style (naming, indent, framework idioms) from files it just read
  • Shell + git fluencyfind, grep, rg, git log, git diff, conditional pipelines — used like a human dev would
  • Anti-yap discipline — no apology preambles, no "Certainly! I'd be happy to…", no recap-the-task openings, no recap-what-I-just-did closings
  • Honest uncertainty — admits when it doesn't know rather than hallucinating function names or flags
  • Refusal-free — base is uncensored; fine-tune doesn't add safety filters back

The first three are directly measured. The rest are properties of the source distribution that the LoRA was strong enough to absorb at r=32 / α=64 over 2 epochs.

Use it

Easiest path — llama.cpp server:

hf download sterlixlol/kazi --local-dir .

llama-server \
    -m kazi-final-q8_0.gguf \
    -ngl -1 \
    -c 262144 \
    --jinja \
    --temp 1.0 \
    --parallel 1 \
    --host 0.0.0.0 \
    --port 8080

Hardware needed: ~32 GB VRAM at full 262K context. Single 3090/4090/A100/L40S all work; 4×3090 layer-split also fine.

Training recipe

Setting Value
Base DavidAU/Qwen3.6-27B-Heretic2-Uncensored-Finetune-Thinking
Method QLoRA 4-bit (Unsloth + bnb)
Rank 32
Alpha 64
Target modules All 7 (q,k,v,o + gate,up,down)
LR 2e-4, cosine schedule
Epochs 2
Hardware 1× A100 40GB SXM4 (Vast.ai)
Total cost $63
Best checkpoint step 2176 (final), eval loss 0.4359
Kernels flash-linear-attention + causal-conv1d (3.7× speedup on Gated DeltaNet)

Eval loss progression across saved checkpoints:

0.4762 → 0.4576 → 0.4461 → 0.4524 → 0.4445 → 0.4383 → 0.4364 → 0.4359
 base    step290  step580  step1160  ...                          final

Dataset

sterlixlol/kazi-agentic-traces-19k — 19K multi-turn agentic SWE traces, judge-filtered from a larger pool of completions across multiple frontier LLMs.

Files

  • kazi-final-q8_0.gguf — 27 GB, Q8_0 quantization (8.50 BPW). The merged weights, what you actually want.

Limitations

  • Base is Heretic2-Uncensored — refusal rate is ~0%. Treat it like an unrestricted dev tool, not a content filter.
  • Tuned for English. Multilingual ability inherits from base, untested in fine-tune.
  • No vision tower — text only.
  • Not a chatbot for casual chitchat. It will try to call tools or write code.

License

Apache 2.0, inherited from Qwen3 base. Use freely.

Downloads last month
86
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for sterlixlol/kazi