Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuning Paper • 2602.01745 • Published Feb 2 • 7
view article Article OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve May 20, 2025 • 65
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle Paper • 2512.04324 • Published Dec 3, 2025 • 159
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism Paper • 2511.11373 • Published Nov 14, 2025 • 14
ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models Paper • 2510.06014 • Published Oct 7, 2025 • 10
Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs Paper • 2510.24514 • Published Oct 28, 2025 • 22
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration Paper • 2508.13755 • Published Aug 19, 2025 • 14
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 238
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics Paper • 2508.18124 • Published Aug 25, 2025 • 49
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19, 2025 • 119
Pixels, Patterns, but No Poetry: To See The World like Humans Paper • 2507.16863 • Published Jul 21, 2025 • 69
Pixels, Patterns, but No Poetry: To See The World like Humans Paper • 2507.16863 • Published Jul 21, 2025 • 69 • 6
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17, 2025 • 263
LongDocURL: a Comprehensive Multimodal Long Document Benchmark Integrating Understanding, Reasoning, and Locating Paper • 2412.18424 • Published Dec 24, 2024 • 1
From System 1 to System 2: A Survey of Reasoning Large Language Models Paper • 2502.17419 • Published Feb 24, 2025 • 3
SOLIDGEO: Measuring Multimodal Spatial Math Reasoning in Solid Geometry Paper • 2505.21177 • Published May 27, 2025 • 1
TL;DR: Too Long, Do Re-weighting for Effcient LLM Reasoning Compression Paper • 2506.02678 • Published Jun 3, 2025 • 5
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning Paper • 2506.08989 • Published Jun 10, 2025 • 14