feat: add thesis section + Codex agent swarm narrative + 9B scaling probe + rewrite process log 4ed9a84 qtzx06 commited on 30 days ago
feat: rewrite training to use TRL rollout_func + OpenEnv multi-turn pattern 93f58fd qtzx06 commited on about 1 month ago
docs: log Qwen 3.5 9B inference test on H100 (reward=0.25) 8da9024 qtzx06 commited on about 1 month ago
feat: fix openenv 0.2.1 API, add deployment files and GRPO training ea3bbb3 qtzx06 commited on about 1 month ago