Reasoning over mathematical objects: on-policy reward modeling and test time aggregation Paper • 2603.18886 • Published Mar 19 • 6
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Paper • 2605.09063 • Published 5 days ago • 71
IDIOLEX: Unified and Continuous Representations for Idiolectal and Stylistic Variation Paper • 2604.04704 • Published Apr 6
Gained in Translation: Privileged Pairwise Judges Enhance Multilingual Reasoning Paper • 2601.18722 • Published Jan 26