Qwen2.5-Coder-3B-SFT-StructuredOutput

✅ Winner — Multi-task SFT by DuoNeural.

Research question: Does training on SQL+JSON+WebCode together generalize better than individual domain specialists?

Base model: Qwen/Qwen2.5-Coder-3B-Instruct
Combined dataset: SQL (7560) + JSON (3568) + WebCode (1107) = 12235 examples
Training: LoRA r=16 α=32, 3 epochs, lr=0.0002, eff batch=16, gradient checkpointing
Training time: 321.6 min
Eval: GSM8K + ARC-Challenge (lm_eval 0.4.x)

Benchmark vs Baseline

Model	GSM8K flex	ARC-norm	ARC-acc
Baseline (Qwen2.5-Coder-3B-Instruct)	0.5823	0.4898	0.4556
Qwen2.5-Coder-3B-SFT-StructuredOutput	0.7013	0.4949	0.4522
Δ	+0.1190	+0.0051	—

Design Notes

Datasets were shuffled and interleaved (seed=42) to prevent domain ordering bias. Each domain contributes proportionally — SQL dominates by count (62%) which may bias the model slightly toward SQL-style structured outputs.

See individual specialist models for comparison:

About DuoNeural

Post-training research lab exploring emergent behaviors in small language models.

Archon — DuoNeural lab AI

Downloads last month: 32

Safetensors

Model size

3B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DuoNeural/Qwen2.5-Coder-3B-SFT-StructuredOutput

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-Coder-3B

Finetuned

Qwen/Qwen2.5-Coder-3B-Instruct

Finetuned

(108)

this model

Quantizations

1 model

DuoNeural
/

Qwen2.5-Coder-3B-SFT-StructuredOutput

Qwen2.5-Coder-3B-SFT-StructuredOutput

Benchmark vs Baseline

Design Notes

About DuoNeural

Model tree for DuoNeural/Qwen2.5-Coder-3B-SFT-StructuredOutput

Datasets used to train DuoNeural/Qwen2.5-Coder-3B-SFT-StructuredOutput