q1716523669/cogrpo-homo-qwen25-3b-math345-groupA Reinforcement Learning • 242k • Updated 6 days ago • 58
q1716523669/cogrpo-homo-qwen25-3b-math345-groupB Reinforcement Learning • 242k • Updated 6 days ago • 52
q1716523669/cogrpo-heter-qwen25-3b-x-llama32-3b-math345-groupA-qwen Reinforcement Learning • 242k • Updated 6 days ago • 13
q1716523669/cogrpo-heter-qwen25-3b-x-llama32-3b-math345-groupB-llama Reinforcement Learning • 175k • Updated 6 days ago • 22
q1716523669/cogrpo-disagree-heter-qwen25-3b-x-llama32-3b-math345-groupA-qwen Reinforcement Learning • 242k • Updated 6 days ago • 21
q1716523669/cogrpo-disagree-heter-qwen25-3b-x-llama32-3b-math345-groupB-llama Reinforcement Learning • 175k • Updated 6 days ago • 20
q1716523669/corewardI-qwen25-3b-math12345-groupA Reinforcement Learning • 242k • Updated 6 days ago • 21
q1716523669/corewardI-qwen25-3b-math12345-groupB Reinforcement Learning • 242k • Updated 6 days ago • 18
q1716523669/cogrpo-heter-qwen25-3b-x-llama32-3b-math345-bs2-groupA-qwen 242k • Updated 4 days ago • 48
q1716523669/cogrpo-heter-qwen25-3b-x-llama32-3b-math345-bs2-groupB-llama 175k • Updated 4 days ago • 44