DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI Paper β’ 2512.16676 β’ Published Dec 18, 2025 β’ 221
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning Paper β’ 2505.24726 β’ Published May 30, 2025 β’ 279
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper β’ 2507.01006 β’ Published Jul 1, 2025 β’ 252
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper β’ 2506.13585 β’ Published Jun 16, 2025 β’ 274
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning Paper β’ 2505.11896 β’ Published May 17, 2025 β’ 58
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models Paper β’ 2505.02686 β’ Published May 5, 2025 β’ 16
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures Paper β’ 2505.09343 β’ Published May 14, 2025 β’ 76
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper β’ 2501.12948 β’ Published Jan 22, 2025 β’ 444
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper β’ 2501.08313 β’ Published Jan 14, 2025 β’ 302
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper β’ 2505.04921 β’ Published May 8, 2025 β’ 187
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper β’ 2505.03335 β’ Published May 6, 2025 β’ 191
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters Paper β’ 2504.08791 β’ Published Apr 7, 2025 β’ 140