shubhamrgandhi/qwen3-8b-full-sft-prm-r2egym-swebench-instructions-k5-cwm-plus-qwen
Text Generation • 1B • Updated • 9
SFT-trained step-level critic (Qwen3-8B) for code agents on SWE-bench Verified. Companion to https://github.com/shubhamrgandhi/critic-training
Note Trained critic — Qwen3-8B SFT'd on mixed CWM + Qwen3-Next-80B trajectories
Note Headline training data — concise prompt, CWM-only, 4,532 samples
Note Mixed-teacher training data — concise prompt, CWM + Qwen3-Next-80B, 6,447 samples
Note Detailed-prompt ablation training data — 3,135 samples
Note Headline single-agent critic — Qwen3-8B SFT'd on concise (instructions) prompt critiques over CWM trajectories
Note Detailed-prompt ablation critic — Qwen3-8B SFT'd on detailed-prompt critiques over CWM trajectories