Qwen3.5-DeltaCoder-9B-GGUF

v1.1-DPO — Now with DPO alignment for improved code correctness and self-verification. If you downloaded before March 28, 2026, please re-pull to get v1.1-DPO.

GGUF quantizations of Qwen3.5-DeltaCoder-9B for use with llama.cpp, Ollama, LM Studio, and other GGUF-compatible inference engines.

What's New in v1.1-DPO

  • DPO alignment on 4,519 preference pairs from AceCode-V2-122K
  • Self-correcting behavior — model now detects and fixes its own bugs rather than submitting incorrect code
  • Improved code correctness — trained to prefer passing solutions over failing ones
  • Same tool-call reliability as v1 — SFT improvements preserved through two-stage merge

Available Quantizations

File Quant Size Notes
DeltaCoder-9B-v1.1-DPO-Q2_K.gguf Q2_K ~3.6 GB Smallest, lowest quality
DeltaCoder-9B-v1.1-DPO-Q3_K_S.gguf Q3_K_S ~4.0 GB
DeltaCoder-9B-v1.1-DPO-Q3_K_M.gguf Q3_K_M ~4.4 GB
DeltaCoder-9B-v1.1-DPO-Q3_K_L.gguf Q3_K_L ~4.6 GB
DeltaCoder-9B-v1.1-DPO-Q4_0.gguf Q4_0 ~3.2 GB
DeltaCoder-9B-v1.1-DPO-Q4_K_S.gguf Q4_K_S ~5.0 GB
DeltaCoder-9B-v1.1-DPO-Q4_K_M.gguf Q4_K_M ~5.5 GB Recommended
DeltaCoder-9B-v1.1-DPO-Q5_K_S.gguf Q5_K_S ~6.1 GB
DeltaCoder-9B-v1.1-DPO-Q5_0.gguf Q5_0 ~6.1 GB
DeltaCoder-9B-v1.1-DPO-Q5_K_M.gguf Q5_K_M ~6.5 GB
DeltaCoder-9B-v1.1-DPO-Q6_K.gguf Q6_K ~7.3 GB
DeltaCoder-9B-v1.1-DPO-Q8_0.gguf Q8_0 ~9.4 GB Near-lossless
DeltaCoder-9B-v1.1-DPO-BF16.gguf BF16 ~17.9 GB Full precision

Recommended Quant

  • Low VRAM (8GB): Q4_K_M
  • Mid VRAM (12GB): Q5_K_M or Q6_K
  • High VRAM (16GB+): Q8_0
  • Full precision: BF16

Training Lineage

Qwen/Qwen3.5-9B-Base
 └─ Qwen/Qwen3.5-9B  (instruction tuned)
     └─ Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2
         (SFT on Claude 4.6 Opus reasoning traces)
         └─ danielcherubini/Qwen3.5-DeltaCoder-9B  (v1 SFT — tool-call reliability)
             (LoRA SFT on CoderForge-Preview)
             └─ danielcherubini/Qwen3.5-DeltaCoder-9B v1.1-DPO  ← this model
                 (DPO on AceCode-V2-122K preference pairs)

Recommended Sampling Settings

Parameter Value
temperature 0.6
top_k 20
top_p 0.95
min_p 0.0
presence_penalty 0.0
repeat_penalty 1.0

Do not use temperature below 0.5 — low temperatures cause deterministic looping in multi-turn agentic use.

KV Cache Quantization

Context Length KV Cache VRAM (Q4_K_M) Generation Speed
102,400 f16/q4_0 ~8.5 GB ~111 tok/s
131,072 f16/q4_0 ~9.1 GB ~110 tok/s
# llama.cpp / ik_llama.cpp flags
-ctk f16 -ctv q4_0

Usage

Ollama

ollama create deltacoder -f Modelfile

Example Modelfile:

FROM ./DeltaCoder-9B-v1.1-DPO-Q5_K_M.gguf

llama.cpp

./llama-server -m DeltaCoder-9B-v1.1-DPO-Q5_K_M.gguf -ngl 999 -c 131072 -ctk f16 -ctv q4_0 -fa 1 --jinja

LM Studio

Download any GGUF file and load it directly in LM Studio.

Benchmarks

Model HumanEval HumanEval+ Terminal-Bench Easy
Jackrong Qwen3.5-9B-v2 (base) 53.7%
DeltaCoder-9B v1 (temp=0.6) 50.6% 49.4% 2/4 (50%)
DeltaCoder-9B v1.1-DPO (temp=0.6) TBD TBD 2/4 (50%)*

*v1.1-DPO timed out on 2 tasks that v1 answered incorrectly — behavioral improvement confirmed, running with extended timeout.

Acknowledgements

Downloads last month
2,528
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for danielcherubini/Qwen3.5-DeltaCoder-9B-GGUF

Finetuned
Qwen/Qwen3.5-9B
Quantized
(1)
this model

Datasets used to train danielcherubini/Qwen3.5-DeltaCoder-9B-GGUF