pe-nlp/Qwen2.5-3b-grpo-orz-cl-step200
3B • Updated • 1
pe-nlp/Qwen2.5-7b-grpo-orz-cl-step80
pe-nlp/Qwen2.5-7b-orz-step375
8B • Updated • 1
pe-nlp/Qwen2.5-3b-grpo-orz-cl-step130
pe-nlp/Qwen2.5-3b-grpo-cl3-step133
3B • Updated • 3
pe-nlp/Qwen2.5-3b-grpo-orz-cl-step100
3B • Updated • 1
pe-nlp/Qwen2.5-7b-grpo-orz-cl-step50
8B • Updated • 1
pe-nlp/Qwen2.5-7b-grpo-bigmath-cl-step184
8B • Updated • 1
pe-nlp/Qwen2.5-7b-grpo-bigmath-cl-step150
pe-nlp/Qwen2.5-7b-grpo-bigmath-cl
pe-nlp/Qwen2.5-3b-grpo-cl2-step130
pe-nlp/Qwen2.5-3b-grpo-cl-step140
pe-nlp/Qwen2.5-7b-orz-step275
8B • Updated • 1
pe-nlp/Qwen2.5-3b-grpo-cl-step80
pe-nlp/Qwen2.5-7b-orz-step230
8B • Updated • 1
pe-nlp/Qwen2.5-7b-grpo-cl-step480
8B • Updated • 1
pe-nlp/Qwen2.5-7b-grpo-cl-step400
8B • Updated • 1
pe-nlp/Qwen2.5-7b-orz-step170
8B • Updated pe-nlp/Qwen2.5-7b-orz-step145
8B • Updated • 1
pe-nlp/Qwen2.5-7b-grpo-cl-step320
8B • Updated pe-nlp/Qwen2.5-7b-orz-step115
8B • Updated • 1
pe-nlp/Qwen2.5-7b-grpo-cl-step190
8B • Updated • 1
pe-nlp/Qwen2.5-3B-RLOO-Reward2
3B • Updated pe-nlp/R1-Qwen2.5-32B-Instruct-10k
Text Generation
• 33B • Updated • 2
pe-nlp/R1-Qwen2.5-32B-Instruct-5k
Text Generation
• 33B • Updated • 2
pe-nlp/R1-Qwen2.5-14B-Instruct-math
Text Generation
• 15B • Updated • 4
• 1
pe-nlp/R1-Qwen2.5-14B-Instruct-code
Text Generation
• 15B • Updated pe-nlp/R1-Qwen2.5-14B-Instruct-8k-hard
Text Generation
• 15B • Updated pe-nlp/R1-Qwen2.5-7B-Instruct-math
Text Generation
• 8B • Updated pe-nlp/R1-Qwen2.5-7B-Instruct-code
Text Generation
• 8B • Updated • 2