citrinegui/Qwen2.5-1.5B-Instruct_countdown2345_grpo_variance_regularized_0.5_0.5_0.1_1600 Updated Jun 1
citrinegui/Qwen2.5-1.5B-Instruct_countdown45_grpo_balanced_0.5_0.5_True_1600 Text Generation • 2B • Updated May 10 • 4
citrinegui/Qwen2.5-1.5B-Instruct_countdown345_grpo_balanced_0.5_0.5_True_1600 Text Generation • 2B • Updated May 10 • 7
citrinegui/Llama-3.2-3B-Instruct_countdown6_grpo_balanced_0.5_0.5_True_1600 Text Generation • 3B • Updated May 9 • 5
citrinegui/Llama-3.2-3B-Instruct_countdown5_grpo_balanced_0.5_0.5_True_1600 Text Generation • 3B • Updated May 8 • 7
citrinegui/Qwen2.5-3B-Instruct_countdown6_grpo_balanced_0.5_0.5_True_1600 Text Generation • 3B • Updated May 8 • 8
citrinegui/Qwen2.5-3B-Instruct_countdown5_grpo_balanced_0.5_0.5_True_1600 Text Generation • 3B • Updated May 8 • 8
citrinegui/Llama-3.2-3B-Instruct_countdown2345_grpo_gaussian_0.25_0.75_True_1600 Text Generation • 3B • Updated May 7 • 6
citrinegui/Llama-3.2-3B-Instruct_countdown2345_grpo_gaussian_0.75_0.25_True_1600 Text Generation • 3B • Updated May 7 • 7
citrinegui/Llama-3.2-3B-Instruct_countdown2345_grpo_balanced_0.5_0.5_True_1600 Text Generation • 3B • Updated May 6 • 13
citrinegui/Llama-3.2-3B-Instruct_countdown2345_grpo_cosine_0.5_0.5_True_1600 Text Generation • 3B • Updated May 6 • 7
citrinegui/Llama-3.2-3B-Instruct_countdown2345_grpo_classic_0.5_0.5_True_1600 Text Generation • 3B • Updated May 5 • 8
citrinegui/Llama-3.2-3B-Instruct_countdown2345_grpo_gaussian_0.5_0.5_True_1600 Text Generation • 3B • Updated May 4 • 9
citrinegui/Qwen2.5-3B-Instruct_countdown2345_grpo_gaussian_0.75_0.25_True_1600 Text Generation • 3B • Updated Apr 30 • 6
citrinegui/Qwen2.5-3B-Instruct_countdown2345_grpo_gaussian_0_25_0_75_True_1600 Text Generation • 3B • Updated Apr 29 • 7
citrinegui/Qwen2.5-3B-Instruct_countdown2345_grpo_gaussian_0_5_0_5_True_1600 Text Generation • 3B • Updated Apr 26 • 7
citrinegui/Qwen2.5-3B-Instruct_countdown2345_grpo_classic_0_5_0_5_True_1600 Text Generation • 3B • Updated Apr 24 • 8
citrinegui/Qwen2.5-1.5B-Instruct_countdown2345_grpo_gaussian_0_75_0_25_True_1600 Text Generation • 2B • Updated Apr 24 • 6
citrinegui/Qwen2.5-3B-Instruct_countdown2345_grpo_balanced_0_5_0_5_True_1600 Text Generation • 3B • Updated Apr 24 • 7
citrinegui/Qwen2.5-1.5B-Instruct_countdown2345_grpo_classic_0_5_0_5_True_1600 Text Generation • 2B • Updated Apr 24 • 8
citrinegui/Qwen2.5-3B-Instruct_countdown2345_grpo_cosine_0_5_0_5_True_1600 Text Generation • 3B • Updated Apr 24 • 7
citrinegui/Qwen2.5-1.5B-Instruct_countdown2345_grpo_gaussian_0_25_0_75_True_1600 Text Generation • 2B • Updated Apr 24 • 7
citrinegui/Qwen2.5-1.5B-Instruct_countdown2345_grpo_balanced_0.5_0.5_True_1600 Text Generation • 2B • Updated Apr 16 • 8
citrinegui/Qwen2.5-1.5B-Instruct_countdown2345_grpo_cosine_0.5_0.5_True_1600 Text Generation • 2B • Updated Apr 16 • 8
citrinegui/Qwen2.5-1.5B-Instruct_countdown6_grpo_balanced_0.5_0.5_True_1600 Text Generation • 2B • Updated Apr 16 • 8
citrinegui/Qwen2.5-1.5B-Instruct_countdown5_grpo_balanced_0.5_0.5_True_1600 Text Generation • 2B • Updated Apr 16 • 8
citrinegui/Qwen2.5-1.5B-Instruct_countdown2345_grpo_gaussian_0.5_0.5_True_1600 Text Generation • 2B • Updated Apr 15 • 10