chore: refresh training artifacts and rename consume_reward_components to private 7909885 Mohammed-Altaf commited on 18 days ago
feat: add episode trace, refresh training dataset, and update eval metrics a422c8d Mohammed-Altaf commited on 18 days ago
refactor: move training code to scripts/, add train/eval split, tune GRPO hyperparams fad16c9 Mohammed-Altaf commited on 18 days ago