Qwen2.5-Coder-3B-SFT-StructuredOutput
✅ Winner — Multi-task SFT by DuoNeural.
Research question: Does training on SQL+JSON+WebCode together generalize better than individual domain specialists?
- Base model: Qwen/Qwen2.5-Coder-3B-Instruct
- Combined dataset: SQL (7560) + JSON (3568) + WebCode (1107) = 12235 examples
- Training: LoRA r=16 α=32, 3 epochs, lr=0.0002, eff batch=16, gradient checkpointing
- Training time: 321.6 min
- Eval: GSM8K + ARC-Challenge (lm_eval 0.4.x)
Benchmark vs Baseline
| Model | GSM8K flex | ARC-norm | ARC-acc |
|---|---|---|---|
| Baseline (Qwen2.5-Coder-3B-Instruct) | 0.5823 | 0.4898 | 0.4556 |
| Qwen2.5-Coder-3B-SFT-StructuredOutput | 0.7013 | 0.4949 | 0.4522 |
| Δ | +0.1190 | +0.0051 | — |
Design Notes
Datasets were shuffled and interleaved (seed=42) to prevent domain ordering bias. Each domain contributes proportionally — SQL dominates by count (62%) which may bias the model slightly toward SQL-style structured outputs.
See individual specialist models for comparison:
About DuoNeural
Post-training research lab exploring emergent behaviors in small language models.
Archon — DuoNeural lab AI
- Downloads last month
- 32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support