ES-GRPO alphaXiv/es-grpo-results Updated 23 days ago • 78 alphaXiv/countdown-full Viewer • Updated 24 days ago • 2.2k • 27
spurious-rewards alphaXiv/spurious-rewards-rlvr-training-qwen-2.5-1.5b-math-ckpt-400 2B • Updated Jan 1 alphaXiv/spurious-rewards-rlvr-training-qwen-2.5-1.5b-math-ckpt-1000 2B • Updated Jan 1 alphaXiv/spurious-rewards-rlvr-training-qwen-2.5-1.5b-math-ckpt-200 2B • Updated Jan 1 alphaXiv/spurious-rewards-rlvr-training-qwen-2.5-1.5b-math-ckpt-50 2B • Updated Jan 1
Reproducing-TRM alphaXiv/trm-model-arc-agi-1 Updated Oct 22, 2025 • 4 alphaXiv/trm-model-maze Updated Oct 22, 2025 • 5 alphaXiv/trm-model-sudoku Updated Oct 22, 2025 • 3
attention-is-not-all-you-need alphaXiv/attention-is-not-all-you-need-models Updated Jan 12 Salesforce/wikitext Viewer • Updated Jan 4, 2024 • 3.71M • 1.05M • 650 stanfordnlp/snli Viewer • Updated Mar 6, 2024 • 570k • 18.5k • 89
Agent-R1 alphaXiv/Qwen-2.5-1.5b-instruct-ppo 2B • Updated Dec 26, 2025 alphaXiv/Qwen-2.5-1.5b-instruct-grpo 2B • Updated Dec 26, 2025 • 2
ES-GRPO alphaXiv/es-grpo-results Updated 23 days ago • 78 alphaXiv/countdown-full Viewer • Updated 24 days ago • 2.2k • 27
attention-is-not-all-you-need alphaXiv/attention-is-not-all-you-need-models Updated Jan 12 Salesforce/wikitext Viewer • Updated Jan 4, 2024 • 3.71M • 1.05M • 650 stanfordnlp/snli Viewer • Updated Mar 6, 2024 • 570k • 18.5k • 89
spurious-rewards alphaXiv/spurious-rewards-rlvr-training-qwen-2.5-1.5b-math-ckpt-400 2B • Updated Jan 1 alphaXiv/spurious-rewards-rlvr-training-qwen-2.5-1.5b-math-ckpt-1000 2B • Updated Jan 1 alphaXiv/spurious-rewards-rlvr-training-qwen-2.5-1.5b-math-ckpt-200 2B • Updated Jan 1 alphaXiv/spurious-rewards-rlvr-training-qwen-2.5-1.5b-math-ckpt-50 2B • Updated Jan 1
Agent-R1 alphaXiv/Qwen-2.5-1.5b-instruct-ppo 2B • Updated Dec 26, 2025 alphaXiv/Qwen-2.5-1.5b-instruct-grpo 2B • Updated Dec 26, 2025 • 2
Reproducing-TRM alphaXiv/trm-model-arc-agi-1 Updated Oct 22, 2025 • 4 alphaXiv/trm-model-maze Updated Oct 22, 2025 • 5 alphaXiv/trm-model-sudoku Updated Oct 22, 2025 • 3