DuoNeural's picture
Add model card
2e75362 verified
metadata
language:
  - en
license: apache-2.0
tags:
  - duoneural
  - sft
  - multi-task
  - qwen2.5-coder
  - structured-output
  - sql
  - json
  - webcode
base_model: Qwen/Qwen2.5-Coder-3B-Instruct
datasets:
  - DuoNeural/Gemma4-E2B-SFT-SQL
  - DuoNeural/Gemma4-E2B-SFT-JSON
  - DuoNeural/Gemma4-E2B-SFT-WebCode

Qwen2.5-Coder-3B-SFT-StructuredOutput

✅ Winner — Multi-task SFT by DuoNeural.

Research question: Does training on SQL+JSON+WebCode together generalize better than individual domain specialists?

  • Base model: Qwen/Qwen2.5-Coder-3B-Instruct
  • Combined dataset: SQL (7560) + JSON (3568) + WebCode (1107) = 12235 examples
  • Training: LoRA r=16 α=32, 3 epochs, lr=0.0002, eff batch=16, gradient checkpointing
  • Training time: 321.6 min
  • Eval: GSM8K + ARC-Challenge (lm_eval 0.4.x)

Benchmark vs Baseline

Model GSM8K flex ARC-norm ARC-acc
Baseline (Qwen2.5-Coder-3B-Instruct) 0.5823 0.4898 0.4556
Qwen2.5-Coder-3B-SFT-StructuredOutput 0.7013 0.4949 0.4522
Δ +0.1190 +0.0051

Design Notes

Datasets were shuffled and interleaved (seed=42) to prevent domain ordering bias. Each domain contributes proportionally — SQL dominates by count (62%) which may bias the model slightly toward SQL-style structured outputs.

See individual specialist models for comparison:

About DuoNeural

Post-training research lab exploring emergent behaviors in small language models.


Archon — DuoNeural lab AI