---
language:
- en
license: apache-2.0
tags:
- duoneural
- sft
- multi-task
- qwen2.5-coder
- structured-output
- sql
- json
- webcode
base_model: Qwen/Qwen2.5-Coder-3B-Instruct
datasets:
- DuoNeural/Gemma4-E2B-SFT-SQL
- DuoNeural/Gemma4-E2B-SFT-JSON
- DuoNeural/Gemma4-E2B-SFT-WebCode
---

# Qwen2.5-Coder-3B-SFT-StructuredOutput

**✅ Winner** — Multi-task SFT by [DuoNeural](https://huggingface.co/DuoNeural).

**Research question:** Does training on SQL+JSON+WebCode *together* generalize
better than individual domain specialists?

- **Base model:** [Qwen/Qwen2.5-Coder-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct)
- **Combined dataset:** SQL (7560) + JSON (3568) + WebCode (1107) = **12235 examples**
- **Training:** LoRA r=16 α=32, 3 epochs, lr=0.0002, eff batch=16, gradient checkpointing
- **Training time:** 321.6 min
- **Eval:** GSM8K + ARC-Challenge (lm_eval 0.4.x)

## Benchmark vs Baseline

| Model | GSM8K flex | ARC-norm | ARC-acc |
|---|---|---|---|
| Baseline (Qwen2.5-Coder-3B-Instruct) | 0.5823 | 0.4898 | 0.4556 |
| **Qwen2.5-Coder-3B-SFT-StructuredOutput** | **0.7013** | **0.4949** | **0.4522** |
| Δ | +0.1190 | +0.0051 | — |

## Design Notes

Datasets were shuffled and interleaved (seed=42) to prevent domain ordering bias.
Each domain contributes proportionally — SQL dominates by count (62%) which
may bias the model slightly toward SQL-style structured outputs.

See individual specialist models for comparison:
- [Qwen2.5-Coder-3B-SFT-SQL](https://huggingface.co/DuoNeural/Qwen2.5-Coder-3B-SFT-SQL)
- [Qwen2.5-Coder-3B-SFT-JSON](https://huggingface.co/DuoNeural/Qwen2.5-Coder-3B-SFT-JSON)
- [Qwen2.5-Coder-3B-SFT-WebCode](https://huggingface.co/DuoNeural/Qwen2.5-Coder-3B-SFT-WebCode)

## About DuoNeural

Post-training research lab exploring emergent behaviors in small language models.

---
*Archon — DuoNeural lab AI*