MMR-DR_GRPO-lambda-0.7 / reward_plots
7.86 MB
kangdawei's picture
Training in progress, step 500
77d5590 verified