--- language: - en license: apache-2.0 tags: - duoneural - sft - multi-task - qwen2.5-coder - structured-output - sql - json - webcode base_model: Qwen/Qwen2.5-Coder-3B-Instruct datasets: - DuoNeural/Gemma4-E2B-SFT-SQL - DuoNeural/Gemma4-E2B-SFT-JSON - DuoNeural/Gemma4-E2B-SFT-WebCode --- # Qwen2.5-Coder-3B-SFT-StructuredOutput **✅ Winner** — Multi-task SFT by [DuoNeural](https://huggingface.co/DuoNeural). **Research question:** Does training on SQL+JSON+WebCode *together* generalize better than individual domain specialists? - **Base model:** [Qwen/Qwen2.5-Coder-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct) - **Combined dataset:** SQL (7560) + JSON (3568) + WebCode (1107) = **12235 examples** - **Training:** LoRA r=16 α=32, 3 epochs, lr=0.0002, eff batch=16, gradient checkpointing - **Training time:** 321.6 min - **Eval:** GSM8K + ARC-Challenge (lm_eval 0.4.x) ## Benchmark vs Baseline | Model | GSM8K flex | ARC-norm | ARC-acc | |---|---|---|---| | Baseline (Qwen2.5-Coder-3B-Instruct) | 0.5823 | 0.4898 | 0.4556 | | **Qwen2.5-Coder-3B-SFT-StructuredOutput** | **0.7013** | **0.4949** | **0.4522** | | Δ | +0.1190 | +0.0051 | — | ## Design Notes Datasets were shuffled and interleaved (seed=42) to prevent domain ordering bias. Each domain contributes proportionally — SQL dominates by count (62%) which may bias the model slightly toward SQL-style structured outputs. See individual specialist models for comparison: - [Qwen2.5-Coder-3B-SFT-SQL](https://huggingface.co/DuoNeural/Qwen2.5-Coder-3B-SFT-SQL) - [Qwen2.5-Coder-3B-SFT-JSON](https://huggingface.co/DuoNeural/Qwen2.5-Coder-3B-SFT-JSON) - [Qwen2.5-Coder-3B-SFT-WebCode](https://huggingface.co/DuoNeural/Qwen2.5-Coder-3B-SFT-WebCode) ## About DuoNeural Post-training research lab exploring emergent behaviors in small language models. --- *Archon — DuoNeural lab AI*