docs: strengthen Chess960 thesis — why it's the right self-improvement benchmark 7b15ef1 qtzx06 commited on 30 days ago
docs: expand architecture doc with full search stack and training pipeline details 7109aa9 qtzx06 commited on 30 days ago
docs: rewrite demo script with concrete before/after metrics and full results 5a8e942 qtzx06 commited on 30 days ago
feat: add thesis section + Codex agent swarm narrative + 9B scaling probe + rewrite process log 4ed9a84 qtzx06 commited on 30 days ago
docs: sharpen demo script with concrete Elo gains and before/after metrics 219232e qtzx06 commited on 30 days ago
docs: add GRPO deep-dive — environment-grounded RL over bounded tool use 55b59f4 qtzx06 commited on 30 days ago
feat: rewrite training to use TRL rollout_func + OpenEnv multi-turn pattern 93f58fd qtzx06 commited on about 1 month ago
docs: log Qwen 3.5 9B inference test on H100 (reward=0.25) 8da9024 qtzx06 commited on about 1 month ago
feat: fix openenv 0.2.1 API, add deployment files and GRPO training ea3bbb3 qtzx06 commited on about 1 month ago