MMR-GRPO-lambda-0.8 / reward_data
284 MB
kangdawei's picture
Training in progress, step 500
38bcf97 verified