FlowRL: Matching Reward Distributions for LLM Reasoning Paper • 2509.15207 • Published Sep 18, 2025 • 116 • 8
Reasoning with Exploration: An Entropy Perspective Paper • 2506.14758 • Published Jun 17, 2025 • 31 • 10
Reasoning with Exploration: An Entropy Perspective Paper • 2506.14758 • Published Jun 17, 2025 • 31 • 10
Reasoning with Exploration: An Entropy Perspective Paper • 2506.14758 • Published Jun 17, 2025 • 31 • 10
Reasoning with Exploration: An Entropy Perspective Paper • 2506.14758 • Published Jun 17, 2025 • 31 • 10
Reasoning with Exploration: An Entropy Perspective Paper • 2506.14758 • Published Jun 17, 2025 • 31 • 10
How to Synthesize Text Data without Model Collapse? Paper • 2412.14689 • Published Dec 19, 2024 • 53 • 4
On Domain-Specific Post-Training for Multimodal Large Language Models Paper • 2411.19930 • Published Nov 29, 2024 • 31 • 3
Adapting Large Language Models via Reading Comprehension Paper • 2309.09530 • Published Sep 18, 2023 • 81 • 3
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20, 2024 • 96 • 25
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20, 2024 • 96 • 25