MMR-GRPO-lambda-0.6 / reward_plots
7.83 MB
kangdawei's picture
Training in progress, step 500
4f585b0 verified