v3: benchmark results, final report, agent/eval improvements, smoke test fixes b3fc5ee israaaML Claude Sonnet 4.6 commited on 1 day ago
v2: curriculum scheduling, SFT pipeline, reward redesign, agent guide 16038fc israaaML Claude Sonnet 4.6 commited on 2 days ago