Semi-Supervised Reward Modeling via Iterative Self-Training Paper • 2409.06903 • Published Sep 10, 2024
Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks Paper • 2410.18210 • Published Oct 23, 2024
MergeBench: A Benchmark for Merging Domain-Specialized LLMs Paper • 2505.10833 • Published May 16, 2025 • 1
Scalable Data Synthesis for Computer Use Agents with Step-Level Filtering Paper • 2512.10962 • Published Nov 22, 2025 • 3
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic Paper • 2408.13656 • Published Aug 24, 2024
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models Paper • 2505.10554 • Published May 15, 2025 • 120
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL Paper • 2505.02391 • Published May 5, 2025 • 25
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation Paper • 2502.03860 • Published Feb 6, 2025 • 25
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published Jan 31, 2025 • 39
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 38
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs Paper • 2410.04698 • Published Oct 7, 2024 • 13
cornfieldrm/pair-preference-dataset-700K_subset-4-of-4_gemma-2b_1of4_iter1_bs128_lr1e-5_conf-0.7_slic Viewer • Updated Jul 25, 2024 • 72.1k • 5
cornfieldrm/pair-preference-dataset-700K_subset-3-of-4_gemma-2b_1of4_iter1_bs128_lr1e-5_conf-0.7_slic Viewer • Updated Jul 25, 2024 • 71.9k • 5
cornfieldrm/pair-preference-dataset-700K_subset-2-of-4_gemma-2b_1of4_iter1_bs128_lr1e-5_conf-0.7_slic Viewer • Updated Jul 25, 2024 • 72.4k • 8
cornfieldrm/pair-preference-dataset-700K_subset-4-of-4_gemma-2b_1of4_iter1_bs128_lr1e-5_conf-0.7 Viewer • Updated Jul 25, 2024 • 72.1k • 6
cornfieldrm/pair-preference-dataset-700K_subset-3-of-4_gemma-2b_1of4_iter1_bs128_lr1e-5_conf-0.7 Viewer • Updated Jul 25, 2024 • 71.9k • 6