shubhamprshr/Qwen2.5-1.5B-Instruct_gsm8k_grpo_gaussian_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 20, 2025 • 8
shubhamprshr/Qwen2.5-1.5B-Instruct_gsm8k_grpo_gaussian_0.25_0.75_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 20, 2025 • 5
shubhamprshr/Qwen2.5-1.5B-Instruct_countdown2345_grpo_cosine_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 18, 2025 • 5
shubhamprshr/Qwen2.5-1.5B-Instruct_countdown2345_grpo_gaussian_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 18, 2025 • 5
shubhamprshr/Qwen2.5-1.5B-Instruct_math_grpo_gaussian_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 18, 2025 • 9
shubhamprshr/Qwen2.5-1.5B-Instruct_math_grpo_cosine_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 18, 2025 • 9
shubhamprshr/Qwen2.5-1.5B-Instruct_math_grpo_balanced_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 18, 2025 • 6
shubhamprshr/Qwen2.5-1.5B-Instruct_countdown2345_grpo_balanced_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1600 Text Generation • 2B • Updated Nov 18, 2025 • 5
shubhamprshr/Qwen2.5-1.5B-Instruct_blocksworld1246_grpo_cosine_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1200 Text Generation • 2B • Updated Nov 17, 2025 • 7
shubhamprshr/Qwen2.5-1.5B-Instruct_blocksworld1246_grpo_gaussian_0.25_0.75_SEC0.3DRO1.0G0.0_minpTrue_1200 Text Generation • 2B • Updated Nov 17, 2025 • 9
shubhamprshr/Qwen2.5-1.5B-Instruct_blocksworld1246_grpo_balanced_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1200 Text Generation • 2B • Updated Nov 17, 2025 • 12
shubhamprshr/Qwen2.5-7B-Instruct_blocksworld1246_grpo_cosine_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1200 Text Generation • 333k • Updated Oct 4, 2025 • 5
shubhamprshr/Qwen2.5-7B-Instruct_blocksworld1246_grpo_balanced_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1200 Text Generation • 333k • Updated Oct 3, 2025 • 13
shubhamprshr/Qwen2.5-7B-Instruct_blocksworld1246_grpo_cosine_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_600 Text Generation • 333k • Updated Oct 3, 2025 • 11
shubhamprshr/Qwen2.5-3B-Instruct_blocksworld1246_grpo_balanced_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1200 Updated Sep 27, 2025
shubhamprshr/Llama-3.1-8B-Instruct_blocksworld1246_grpo_balanced_0.5_0.5_SEC0.3DRO1.0G0.0_minpTrue_1200 Updated Sep 26, 2025
shubhamprshr/Qwen2.5-3B-Instruct_math_grpo_vrex_0.5_0.5_SEC1.0DRO0.0G0.0_minp0.0_1200 Text Generation • 242k • Updated Sep 24, 2025 • 4
shubhamprshr/Llama-3.2-3B-Instruct_math_grpo_vrex_0.5_0.5_SEC1.0DRO0.0G0.0_minp0.0_1200 Text Generation • 175k • Updated Sep 24, 2025 • 5
shubhamprshr/Qwen2.5-3B-Instruct_blocksworld1246_grpo_vrex_0.5_0.5_SEC1.0DRO0.0G0.0_minp0.0_1200 Text Generation • 242k • Updated Sep 23, 2025 • 4
shubhamprshr/Llama-3.2-3B-Instruct_blocksworld1246_grpo_vrex_0.5_0.5_SEC1.0DRO0.0G0.0_minp0.0_1200 Text Generation • 175k • Updated Sep 23, 2025 • 8
shubhamprshr/Qwen2.5-3B-Instruct_math_sgrpo_balanced_0.5_0.5_True_1200 Text Generation • 3B • Updated May 12, 2025 • 5
shubhamprshr/Llama-3.2-3B-Instruct_blocksworld1246_sgrpo_cosine_0.5_0.5_True_1200 Text Generation • 3B • Updated May 8, 2025 • 3
shubhamprshr/Qwen2.5-3B-Instruct_math_sgrpo_cosine_0.5_0.5_True_1200 Text Generation • 3B • Updated May 8, 2025 • 4
shubhamprshr/Qwen2.5-3B-Instruct_math_sgrpo_gaussian_0.25_0.75_True_1200 Text Generation • 3B • Updated May 7, 2025 • 4
shubhamprshr/Qwen2.5-3B-Instruct_math_sgrpo_classic_0.5_0.5_True_1200 Text Generation • 3B • Updated May 6, 2025 • 6
shubhamprshr/Qwen2.5-3B-Instruct_math_sgrpo_gaussian_0.5_0.5_True_1200 Text Generation • 3B • Updated May 6, 2025 • 4
shubhamprshr/Qwen2.5-3B-Instruct_blocksworld8_sgrpo_balanced_0.5_0.5_True_1200 Text Generation • 3B • Updated May 5, 2025 • 4
shubhamprshr/Qwen2.5-3B-Instruct_blocksworld6_sgrpo_balanced_0.5_0.5_True_1200 Text Generation • 3B • Updated May 4, 2025 • 5
shubhamprshr/Qwen2.5-1.5B-Instruct_blocksworld246_sgrpo_balanced_0.5_0.5_True_1200 Text Generation • 2B • Updated May 3, 2025 • 5