fix: HF Space port + TRL install order in training Dockerfile 4201933 natnael kahssay Claude Sonnet 4.6 commited on 13 days ago
fix: correct model name to unsloth/gpt-oss-20b (no -instruct suffix) 7ba71bb natnael kahssay Claude Sonnet 4.6 commited on 13 days ago
fix: move unsloth import first, make wandb optional via WANDB_API_KEY bd0f19a natnael kahssay Claude Sonnet 4.6 commited on 13 days ago
fix: install trl>=0.16 last with --upgrade to beat unsloth dep pins 71a483e natnael kahssay Claude Sonnet 4.6 commited on 13 days ago
fix: use CUDA devel image and pin vLLM to 0.12.0 ada7c70 natnael kahssay Claude Sonnet 4.6 commited on 13 days ago
fix: strip ANSI codes in _run_tests() so β/β count correctly 6b28995 natnael kahssay commited on 13 days ago
feat: add W&B reward logging to both training scripts fe33a21 natnael kahssay Claude Sonnet 4.6 commited on 13 days ago
feat: RFC 005 interactive rollout wrapper + multi-turn GRPO training ded7690 natnael kahssay Claude Sonnet 4.6 commited on 13 days ago
feat: replace handcrafted user_messages with real MOA session traces bb5a5ec natnael kahssay Claude Sonnet 4.6 commited on 13 days ago
feat: multi-turn tool-using GRPO training 5e044f0 natnael kahssay Claude Sonnet 4.6 commited on 13 days ago
feat: multi-turn tool-using RL environment (RFC 005 pattern) 5d3d3ff natnael kahssay Claude Sonnet 4.6 commited on 13 days ago
feat: use real moav2 source as RL env, symlinked sandbox, demo.py 0590e15 natnael kahssay commited on 13 days ago
feat: use real moav2 source as RL task suite β symlinked sandbox, 3 real service tasks ce25387 natnael kahssay commited on 13 days ago
fix: embed task content directly, self-contained vitest sandbox 38cd72d natnael kahssay commited on 13 days ago
upgrade: gpt-oss-20b BF16, vLLM GRPO, Northflank env URL, max_steps=300 c844c8c natnael kahssay commited on 13 days ago
add training/ as real directory (Dockerfile + train.py) aae5554 natnael kahssay commited on 13 days ago