MMR-DR_GRPO-lambda-0.5 / reward_plots
7.94 MB
kangdawei's picture
Training in progress, step 500
1fe3afb verified