finalization: blog + README + execution rewrite, drop 3B + openclaw shim 0058c94 Madhav189 commited on about 1 month ago
notebook: broaden Cell 10 SFT-only load except clause c0ea16e Madhav189 Claude Opus 4.7 (1M context) commited on about 1 month ago
notebook: fix GRPO prompts β apply chat template before passing to TRL 215c8ad Madhav189 Claude Opus 4.7 (1M context) commited on about 1 month ago
notebook: install openenv-core (and other repo deps) in Cell 0 091927e Madhav189 Claude Opus 4.7 (1M context) commited on about 1 month ago
notebook: switch Cell 0 to Unsloth's official uv pattern + harden Cells 10-12 b2632ec Madhav189 Claude Opus 4.7 (1M context) commited on about 1 month ago
training: precision rewrite to prevent SFT collapse + GRPO variance starvation 42ab8f0 Madhav189 Claude Opus 4.7 (1M context) commited on about 1 month ago
notebook: use Qwen2.5-3B-Instruct (has chat template) not the base model 2214e76 Madhav189 Claude Opus 4.7 (1M context) commited on about 1 month ago
notebook: align with Unsloth-recommended TRL 0.22.2 65d2643 Madhav189 Claude Opus 4.7 (1M context) commited on about 1 month ago
notebook: show pip install progress (drop -q flag) 93e560c Madhav189 Claude Opus 4.7 (1M context) commited on about 1 month ago
notebook: rewrite for robustness and resumability 17dba36 Madhav189 Claude Opus 4.7 (1M context) commited on Apr 25
notebook: fix TRL >=1.0 API breakage 32e423b Madhav189 Claude Opus 4.7 (1M context) commited on Apr 25
hackathon sprint: grader collapse + coliseum rename + training pipeline c9baa73 Madhav189 Claude Opus 4.7 (1M context) commited on Apr 25
Tier-escalating sre-gym v3.0 β compute β horizon β realism 2733f3f Madhav189 Claude Opus 4.7 (1M context) commited on Apr 25