Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models Paper • 2601.14004 • Published 6 days ago • 44
ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents Paper • 2601.12294 • Published 8 days ago • 16
RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation Paper • 2601.08430 • Published 13 days ago • 57
MMFormalizer: Multimodal Autoformalization in the Wild Paper • 2601.03017 • Published 20 days ago • 104
EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis Paper • 2601.05808 • Published 17 days ago • 36
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Paper • 2512.07783 • Published Dec 8, 2025 • 38
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning Paper • 2512.07461 • Published Dec 8, 2025 • 77
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction Paper • 2512.04987 • Published Dec 4, 2025 • 80
VisPlay: Self-Evolving Vision-Language Models from Images Paper • 2511.15661 • Published Nov 19, 2025 • 43
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph Paper • 2511.00086 • Published Oct 29, 2025 • 42
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators Paper • 2510.00406 • Published Oct 1, 2025 • 66
Who's Your Judge? On the Detectability of LLM-Generated Judgments Paper • 2509.25154 • Published Sep 29, 2025 • 30
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27, 2025 • 84
MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs Paper • 2508.18264 • Published Aug 25, 2025 • 25
Speech-to-LaTeX: New Models and Datasets for Converting Spoken Equations and Sentences Paper • 2508.03542 • Published Aug 5, 2025 • 5
When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs Paper • 2508.03365 • Published Aug 5, 2025 • 5
TextQuests: How Good are LLMs at Text-Based Video Games? Paper • 2507.23701 • Published Jul 31, 2025 • 3
Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System Paper • 2508.06059 • Published Aug 8, 2025 • 4
Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs Paper • 2508.06601 • Published Aug 8, 2025 • 6