Qwen2.5-Coder-3B-SFT-StructuredOutput โ€” GGUF

GGUF quantizations of DuoNeural/Qwen2.5-Coder-3B-SFT-StructuredOutput.

Multi-task SFT on SQL (7,560) + JSON (3,568) + WebCode (1,107) = 12,235 examples combined. GSM8K flexible +20.5% over base Qwen2.5-Coder-3B (0.582โ†’0.701). ARC stable.

Eval vs Baseline

Metric Baseline Multitask SFT Delta
GSM8K flexible 0.5823 0.7013 +20.5%
GSM8K strict 0.6937 0.6907 -0.4%
ARC-acc 0.4556 0.4522 -0.7%
ARC-norm 0.4898 0.4949 +1.0%

Available Quants

File Size Use case
*-Q2_K.gguf ~1.5 GB Minimum size, CPU inference
*-Q3_K_M.gguf ~1.9 GB Small with decent quality
*-Q4_K_M.gguf ~2.2 GB Recommended โ€” best size/quality
*-Q5_K_M.gguf ~2.5 GB High quality
*-Q6_K.gguf ~2.9 GB Very high quality
*-Q8_0.gguf ~3.7 GB Near-lossless

Usage (llama.cpp)

llama-cli -m Qwen2.5-Coder-3B-SFT-StructuredOutput-Q4_K_M.gguf \
  -p "Write a SQL query to find all users who signed up in the last 30 days" \
  -n 256

DuoNeural

DuoNeural is an open AI research lab โ€” human + AI in collaboration.

Platform Link
HuggingFace huggingface.co/DuoNeural
Website duoneural.com
GitHub github.com/DuoNeural
X / Twitter @DuoNeural

Subscribe: duoneural.beehiiv.com

Downloads last month
234
GGUF
Model size
3B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for DuoNeural/Qwen2.5-Coder-3B-SFT-StructuredOutput-GGUF