shubhamprshr/Qwen2.5-1.5B-Instruct_gsm8k_grpo_gaussian_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 20 • 9
shubhamprshr/Qwen2.5-1.5B-Instruct_gsm8k_grpo_gaussian_0.25_0.75_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 20 • 6
shubhamprshr/Qwen2.5-1.5B-Instruct_countdown2345_grpo_cosine_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 18 • 6
shubhamprshr/Qwen2.5-1.5B-Instruct_countdown2345_grpo_gaussian_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 18 • 6
shubhamprshr/Qwen2.5-1.5B-Instruct_math_grpo_gaussian_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 18 • 10
shubhamprshr/Qwen2.5-1.5B-Instruct_math_grpo_cosine_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 18 • 11
shubhamprshr/Qwen2.5-1.5B-Instruct_math_grpo_balanced_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 18 • 8
shubhamprshr/Qwen2.5-1.5B-Instruct_countdown2345_grpo_balanced_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 18 • 6
shubhamprshr/Qwen2.5-1.5B-Instruct_blocksworld1246_grpo_cosine_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1200 Text Generation • 2B • Updated Nov 17 • 9
shubhamprshr/Qwen2.5-1.5B-Instruct_blocksworld1246_grpo_gaussian_0.25_0.75_SEC0.3DRO1.0G0.0_minpTrue_1200 Text Generation • 2B • Updated Nov 17 • 10
shubhamprshr/Qwen2.5-1.5B-Instruct_blocksworld1246_grpo_balanced_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1200 Text Generation • 2B • Updated Nov 17 • 14
shubhamprshr/Qwen2.5-7B-Instruct_blocksworld1246_grpo_cosine_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1200 Text Generation • 333k • Updated Oct 4 • 7
shubhamprshr/Qwen2.5-7B-Instruct_blocksworld1246_grpo_balanced_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1200 Text Generation • 333k • Updated Oct 3 • 15
shubhamprshr/Qwen2.5-7B-Instruct_blocksworld1246_grpo_cosine_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_600 Text Generation • 333k • Updated Oct 3 • 13
shubhamprshr/Qwen2.5-3B-Instruct_blocksworld1246_grpo_balanced_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1200 Updated Sep 27
shubhamprshr/Llama-3.1-8B-Instruct_blocksworld1246_grpo_balanced_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1200 Updated Sep 26
shubhamprshr/Qwen2.5-3B-Instruct_math_grpo_vrex_0.5_0.5_SEC1.0DRO0.0G0.0_minp0.0_1200 Text Generation • 242k • Updated Sep 24 • 5
shubhamprshr/Llama-3.2-3B-Instruct_math_grpo_vrex_0.5_0.5_SEC1.0DRO0.0G0.0_minp0.0_1200 Text Generation • 175k • Updated Sep 24 • 25
shubhamprshr/Qwen2.5-3B-Instruct_blocksworld1246_grpo_vrex_0.5_0.5_SEC1.0DRO0.0G0.0_minp0.0_1200 Text Generation • 242k • Updated Sep 23 • 10
shubhamprshr/Llama-3.2-3B-Instruct_blocksworld1246_grpo_vrex_0.5_0.5_SEC1.0DRO0.0G0.0_minp0.0_1200 Text Generation • 175k • Updated Sep 23 • 10
shubhamprshr/Qwen2.5-3B-Instruct_math_sgrpo_balanced_0.5_0.5_True_1200 Text Generation • 3B • Updated May 12 • 9
shubhamprshr/Llama-3.2-3B-Instruct_blocksworld1246_sgrpo_cosine_0.5_0.5_True_1200 Text Generation • 3B • Updated May 8 • 8
shubhamprshr/Qwen2.5-3B-Instruct_math_sgrpo_cosine_0.5_0.5_True_1200 Text Generation • 3B • Updated May 8 • 8
shubhamprshr/Qwen2.5-3B-Instruct_math_sgrpo_gaussian_0.25_0.75_True_1200 Text Generation • 3B • Updated May 7 • 6
shubhamprshr/Qwen2.5-3B-Instruct_math_sgrpo_classic_0.5_0.5_True_1200 Text Generation • 3B • Updated May 6 • 9
shubhamprshr/Qwen2.5-3B-Instruct_math_sgrpo_gaussian_0.5_0.5_True_1200 Text Generation • 3B • Updated May 6 • 7
shubhamprshr/Qwen2.5-3B-Instruct_blocksworld8_sgrpo_balanced_0.5_0.5_True_1200 Text Generation • 3B • Updated May 5 • 7
shubhamprshr/Qwen2.5-3B-Instruct_blocksworld6_sgrpo_balanced_0.5_0.5_True_1200 Text Generation • 3B • Updated May 4 • 8
shubhamprshr/Qwen2.5-1.5B-Instruct_blocksworld246_sgrpo_balanced_0.5_0.5_True_1200 Text Generation • 2B • Updated May 3 • 7