Polish for hackathon submission: training evidence, two pipelines, UI, docs e81353d K446 commited on about 1 month ago
Fix GRPO training: reward variance, batch/gen alignment, generation config e1ab78c K446 commited on Apr 25
fix: notebook uses compute_grpo_reward_env, updated hyperparams, no emojis 69bab30 K446 commited on Apr 25