MansiJerry/Qwen3-8B-GRPO-learned-base-score-ng-dfq_no_claim_bs_gpt_args_v2 Text Generation • Updated 18 days ago • 158
MansiJerry/Qwen3-8B-GRPO-learned-base-score_arg_rank_con_dfq_no_claim_bs_qwen_arg Text Generation • Updated 18 days ago • 169
MansiJerry/Qwen3-8B-GRPO-learned-mlp-relu-new-base-score Text Generation • Updated 28 days ago • 14