Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2606.15007 • Published 13 days ago • 15
MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling Paper • 2606.13473 • Published 14 days ago • 90
ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research Paper • 2606.07591 • Published 28 days ago • 95
Toward Generalist Autonomous Research via Hypothesis-Tree Refinement Paper • 2606.11926 • Published 15 days ago • 117
SWE-Explore: Benchmarking How Coding Agents Explore Repositories Paper • 2606.07297 • Published 20 days ago • 119
Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models Paper • 2606.03988 • Published 22 days ago • 124
VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models Paper • 2606.16140 • Published 10 days ago • 113
GRAM-R^2: Self-Training Generative Foundation Reward Models for Reward Reasoning Paper • 2509.02492 • Published Sep 2, 2025 • 2
vectara/hallucination_evaluation_model Text Classification • 0.1B • Updated Oct 20, 2025 • 117k • 355
Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR Paper • 2605.15726 • Published May 15 • 34