MMR-DR_GRPO-lambda-0.6 / reward_data
377 MB
kangdawei's picture
Training in progress, step 500
8cd5121 verified